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Topics in Compressed Sensing 
Abstract 

Compressed sensing has a wide range of applications that include error correc- 
tion, imaging, radar and many more. Given a sparse signal in a high dimensional 
space, one wishes to reconstruct that signal accurately and efficiently from a number 
of linear measurements much less than its actual dimension. Although in theory it 
is clear that this is possible, the difficulty hes in the construction of algorithms that 
perform the recovery efficiently, as well as determining which kind of linear measure- 
ments allow for the reconstruction. There have been two distinct major approaches 
to sparse recovery that each present different benefits and shortcomings. The first, 
£i-minimization methods such as Basis Pursuit, use a linear optimization problem to 
recover the signal. This method provides strong guarantees and stability, but relies on 
Linear Programming, whose methods do not yet have strong polynomially bounded 
runtimes. The second approach uses greedy methods that compute the support of 
the signal iteratively. These methods are usually much faster than Basis Pursuit, but 
until recently had not been able to provide the same guarantees. This gap between 
the two approaches was bridged when we developed and analyzed the greedy algo- 
rithm Regularized Orthogonal Matching Pursuit (ROMP). ROMP provides similar 
guarantees to Basis Pursuit as well as the speed of a greedy algorithm. Our more 
recent algorithm Compressive Sampling Matching Pursuit (CoSaMP) improves upon 
these guarantees, and is optimal in every important aspect. Recent work has also 
been done on a reweighted version of the £i-minimization method that improves upon 
the original version in the recovery error and measurement requirements. These al- 
gorithms are discussed in detail, as well as previous work that serves as a foundation 
for sparse signal recovery. 
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Chapter 1 
Introduction 

1.1 Overview 
1.1.1 Main Idea 

The phrase compressed sensing refers to the problem of reahzing a sparse input x 
using few hnear measurements that possess some incoherence properties. The field 
originated recently from an unfavorable opinion about the current signal compression 
methodology. The conventional scheme in signal processing, acquiring the entire sig- 
nal and then compressing it, was questioned by Donoho [20]. Indeed, this technique 
uses tremendous resources to acquire often very large signals, just to throw away 
information during compression. The natural question then is whether we can com- 
bine these two processes, and directly sense the signal or its essential parts using few 
linear measurements. Recent work in compressed sensing has answered this question 
in positive, and the field continues to rapidly produce encouraging results. 

The key objective in compressed sensing (also referred to as sparse signal recovery 
or compressive sampling) is to reconstruct a signal accurately and efficiently from a 
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set of few non-adaptive linear measurements. Signals in this context are vectors, 
many of which in the applications will represent images. Of course, linear algebra 
easily shows that in general it is not possible to reconstruct an arbitrary signal from 
an incomplete set of linear measurements. Thus one must restrict the domain in 
which the signals belong. To this end, we consider sparse signals, those with few 
non-zero coordinates. It is now known that many signals such as real-world images 
or audio signals are sparse either in this sense, or with respect to a different basis. 

Since sparse signals lie in a lower dimensional space, one would think intuitively 
that they may be represented by few linear measurements. This is indeed correct, 
but the difficulty is determining in which lower dimensional subspace such a signal 
lies. That is, we may know that the signal has few non-zero coordinates, but we do 
not know which coordinates those are. It is thus clear that we may not reconstruct 
such signals using a simple linear operator, and that the recovery requires more 
sophisticated techniques. The compressed sensing field has provided many recovery 
algorithms, most with provable as well as empirical results. 

There are several important traits that an optimal recovery algorithm must pos- 
sess. The algorithm needs to be fast, so that it can efficiently recover signals in 
practice. Of course, minimal storage requirements as well would be ideal. The al- 
gorithm should provide uniform guarantees, meaning that given a specific method 
of acquiring linear measurements, the algorithm recovers all sparse signals (possibly 
with high probability). Ideally, the algorithm would require as few linear measure- 
ments as possible. Linear algebra shows us that if a signal has s non-zero coordinates, 
then recovery is theoretically possible with just 2s measurements. However, recovery 
using only this property would require searching through the exponentially large set 
of all possible lower dimensional subspaces, and so in practice is not numerically fea- 



1.1. Overview 



3 



sible. Thus in the more reahstic setting, we may need shghtly more measurements. 
Finally, we wish our ideal recovery algorithm to be stable. This means that if the 
signal or its measurements are perturbed slightly, then the recovery should still be 
approximately accurate. This is essential, since in practice we often encounter not 
only noisy signals or measurements, but also signals that are not exactly sparse, but 
close to being sparse. For example, compressible signals are those whose coefficients 
decay according to some power law. Many signals in practice are compressible, such 
as smooth signals or signals whose variations are bounded. 

1.1.2 Problem Formulation 

Since we will be looking at the reconstruction of sparse vectors, we need a way to 
quantify the sparsity of a vector. We say that a d-dimensional signal x is s-sparse if 
it has s or fewer non-zero coordinates: 

X e W^, \\x\\o := |supp(x)| < s < d, 

where we note that || ■ ||o is a quasi-norm. For 1 < p < oo, we denote by || ■ ||p the 
usual p-norm, 

d 

1=1 

and ||x||oo = max \ xi\. In practice, signals are often encountered that are not exactly 
sparse, but whose coefficients decay rapidly. As mentioned, compressible signals are 
those satisfying a power law decay: 

\x*\ < Ri^-^/i\ (1.1.1) 
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where x* is a non-increasing rearrangement of x, is some positive constant, and 
< g < 1. Note that in particular, sparse signals are compressible. 

Sparse recovery algorithms reconstruct sparse signals from a small set of non- 
adaptive linear measurements. Each measurement can be viewed as an inner product 



measurements in this way, we may then consider the m x d measurement matrix $ 
whose columns are the vectors 0j. We can then view the sparse recovery problem as 
the recovery of the s-sparse signal x G M'^ from its measurement vector u = G M™". 
One of the theoretically simplest ways to recover such a vector from its measurements 
u = ^x is to solve the ^o-niinimization problem 



If X is s-sparse and $ is one-to-one on all 2s-sparse vectors, then the minimizer 
to ( ]1.1.2p must be the signal x. Indeed, if the minimizer is z, then since x is a feasible 
solution, z must be s-sparse as well. Since ^z = u, z — x must be in the kernel of $. 
But z — X is 2s-sparse, and since $ is one-to-one on all such vectors, we must have 
that z = X. Thus this ^o-niinimization problem works perfectly in theory. However, 
it is not numerically feasible and is NP-Hard in general [l9l Sec. 9.2.2]. 

Fortunately, work in compressed sensing has provided us numerically feasible 
alternatives to this NP-Hard problem. One major approach. Basis Pursuit, relaxes 
the ^o-ininimization problem to an £i-minimization problem. Basis Pursuit requires a 
condition on the measurement matrix $ stronger than the simple injectivity on sparse 

vectors, but many kinds of matrices have been shown to satisfy this condition with 

^Although similar results hold for measurements taken over the complex numbers, for simplicity 
of presentation we only consider real numbers throughout. 



with the signal x G M'^ and some vector 




If we collect m 



min ||2;||o subject to ^z = u. 
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number of measurements m = s log*^'^^-' d. The £i-minimization approach provides 
uniform guarantees and stabihty, but rehes on methods in Linear Programming. 
Since there is yet no known strongly polynomial bound, or more importantly, no linear 
hound on the runtime of such methods, these approaches are often not optimally fast. 

The other main approach uses greedy algorithms such as Orthogonal Matching 
Pursuit [62j, Stagewise Orthogonal Matching Pursuit [23], or Iterative Threshold- 
ing [271 E]- Many of these methods calculate the support of the signal iteratively. 
Most of these approaches work for specific measurement matrices with number of 
measurements m = 0{s \ogd). Once the support S of the signal has been calculated, 
the signal x can be reconstructed from its measurements u = as x = ($5)^^, 
where $5 denotes the measurement matrix $ restricted to the columns indexed by 
5* and f denotes the pseudo inverse. Greedy approaches are fast, both in theory and 
practice, but have lacked both stability and uniform guarantees. 

There has thus existed a gap between the approaches. The £i-minimization meth- 
ods have provided strong guarantees but have lacked in optimally fast runtimes, while 
greedy algorithms although fast, have lacked in optimal guarantees. We bridged this 
gap in the two approaches with our new algorithm Regularized Orthogonal Matching 
Pursuit (ROMP). ROMP provides similar uniform guarantees and stability results 
as those of Basis Pursuit, but is an iterative algorithm so also provides the speed of 
the greedy approach. Our next algorithm. Compressive Sampling Matching Pursuit 
(CoSaMP) improves upon the results of ROMP, and is the first algorithm in sparse 
recovery to be provably optimal in every important aspect. 
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1.2 Applications 

The sparse recovery problem has applications in many areas, ranging from medicine 
and coding theory to astronomy and geophysics. Sparse signals arise in practice in 
very natural ways, so compressed sensing lends itself well to many settings. Three 
direct applications are error correction, imaging, and radar. 

1.2.1 Error Correction 

When signals are sent from one party to another, the signal is usually encoded and 
gathers errors. Because the errors usually occur in few places, sparse recovery can be 
used to reconstruct the signal from the corrupted encoded data. This error correction 
problem is a classic problem in coding theory. Coding theory usually assumes the data 
values live in some finite field, but there are many practical applications for encoding 
over the continuous reals. In digital communications, for example, one wishes to 
protect results of onboard computations that are real-valued. These computations 
are performed by circuits that experience faults caused by effects of the outside world. 
This and many other examples are difficult real- world problems of error correction. 

The error correction problem is formulated as follows. Consider a m-dimensional 
input vector / e M*" that we wish to transmit reliably to a remote receiver. In coding 
theory, this vector is referred to as the "plaintext." We transmit the measurements 
z = Af (or "ciphertext" ) where A is the d x m measurement matrix, or the linear 
code. It is clear that if the linear code A has full rank, we can recover the input 
vector / from the ciphertext z. But as is often the case in practice, we consider the 
setting where the ciphertext z has been corrupted. We then wish to reconstruct the 
input signal / from the corrupted measurements z' — Af + e where e e M'^ is the 
sparse error vector. To realize this in the usual compressed sensing setting, consider 



1.2. Applications 



7 



a matrix B whose kernel is the range of A. Apply B to both sides of the equation 
z' = Af + e to get Bz' = Be. Set y = Bz' and the problem becomes reconstructing 
the sparse vector e from its linear measurements y. Once we have recovered the error 
vector e, we have access to the actual measurements Af and since A is full rank can 
recover the input signal /. 

1.2.2 Imaging 

Many images both in nature and otherwise are sparse with respect to some basis. 
Because of this, many applications in imaging are able to take advantage of the tools 
provided by Compressed Sensing. The typical digital camera today records every 
pixel in an image before compressing that data and storing the compressed image. 
Due to the use of silicon, everyday digital cameras today can operate in the megapixel 
range. A natural question asks why we need to acquire this abundance of data, just 
to throw most of it away immediately. This notion sparked the emerging theory of 
Compressive Imaging. In this new framework, the idea is to directly acquire random 
linear measurements of an image without the burdensome step of capturing every 
pixel initially. 

Several issues from this of course arise. The first problem is how to reconstruct the 
image from its random linear measurements. The solution to this problem is provided 
by Compressed Sensing. The next issue lies in actually sampling the random linear 
measurements without first acquiring the entire image. Researchers [25] are working 
on the construction of a device to do just that. Coined the "single-pixel" compressive 
sampling camera, this camera consists of a digital micromirror device (DMD), two 
lenses, a single photon detector and an analog-to-digital (A/D) converter. The first 
lens focuses the light onto the DMD. Each mirror on the DMD corresponds to a pixel 
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in the image, and can be tilted toward or away from the second lens. This operation 
is analogous to creating inner products with random vectors. This light is then 
collected by the lens and focused onto the photon detector where the measurement 
is computed. This optical computer computes the random linear measurements of 
the image in this way and passes those to a digital computer that reconstructs the 
image. 

Since this camera utilizes only one photon detector, its design is a stark contrast to 
the usual large photon detector array in most cameras. The single-pixel compressive 
sampling camera also operates at a much broader range of the light spectrum than 
traditional cameras that use silicon. For example, because silicon cannot capture a 
wide range of the spectrum, a digital camera to capture infrared images is much more 
complex and costly. 

Compressed Sensing is also used in medical imaging, in particular with magnetic 
resonance (MR) images which sample Fourier coefficients of an image. MR images 
are implicitly sparse and can thus capitalize on the theories of Compressed Sensing. 
Some MR images such as angiograms are sparse in their actual pixel representation, 
whereas more complicated MR images are sparse with respect to some other basis, 
such as the wavelet Fourier basis. MR imaging in general is very time costly, as the 
speed of data collection is limited by physical and physiological constraints. Thus 
it is extremely beneficial to reduce the number of measurements collected without 
sacrificing quality of the MR image. Compressed Sensing again provides exactly this, 
and many Compressed Sensing algorithms have been specifically designed with MR 
images in mind [361 US] ■ 
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1.2.3 Radar 

There are many other apphcations to compressed sensing (see [I3]), and one addi- 
tional application is Compressive Radar Imaging. A standard radar system transmits 
some sort of pulse (for example a linear chirp), and then uses a matched filter to corre- 
late the signal received with that pulse. The receiver uses a pulse compression system 
along with a high-rate analog to digital (A/D) converter. This conventional approach 
is not only complicated and expensive, but the resolution of targets in this classical 
framework is limited by the radar uncertainty principle. Compressive Radar Imag- 
ing tackles these problems by discretizing the time-frequency plane into a grid and 
considering each possible target scene as a matrix. If the number of targets is small 
enough, then the grid will be sparsely populated, and we can employ Compressed 
Sensing techniques to recover the target scene. See [H [39] for more details. 
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Chapter 2 



Major Algorithmic Approaches 



Compressed Sensing has provided many methods to solve the sparse recovery prob- 
lem and thus its applications. There are two major algorithmic approaches to this 
problem. The first relies on an optimization problem which can be solved using lin- 
ear programming, while the second approach takes advantage of the speed of greedy 
algorithms. Both approaches have advantages and disadvantages which are discussed 
throughout this chapter along with descriptions of the algorithms themselves. First 
we discuss Basis Pursuit, a method that utilizes a linear program to solve the sparse 
recovery problem. 

2.1 Basis Pursuit 

Recall that sparse recovery can be formulated as the generally NP-Hard problem (11. 1.21) 
to recover a signal x. Donoho and his collaborators showed (see e.g. pi]) that for 
certain measurement matrices this hard problem is equivalent to its relaxation, 



min subject to <l>z = u. 
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Candes and Tao proved that for measurement matrices satisfying a certain quantita- 
tive property, the programs fll.l.2p and (12.1. ip are equivalent [B]. 

2.1.1 Description 

Since the problem (11.1.20 is not numerically feasible, it is clear that if one is to 
solve the problem efficiently, a different approach is needed. At first glance, one 
may instead wish to consider the mean square approach, based on the minimization 
problem, 

min||z||2 subject to = u. (2.1.2) 

Since the minimizer x* must satisfy $x* = u = $x, the minimizer must be in the 
subspace K = x + ker$. In fact, the minimizer x* to (12.1.20 is the contact point 
where the smallest Euclidean ball centered at the origin meets the subspace K. As is 
illustrated in Figure 12.1.11 this contact point need not coincide with the actual signal 
X. This is because the geometry of the Euclidean ball does not lend itself well to 
detecting sparsity. 

We may then wish to consider the £i-minimization problem (I2.1.ip . In this case, 
the minimizer x* to (I2.1.ip is the contact point where the smallest octahedron cen- 
tered at the origin meets the subspace K. Since x is sparse, it lies in a low-dimensional 
coordinate subspace. Thus the octahedron has a wedge at x (see Figure I2.1.ip . which 
forces the minimizer x* to coincide with x for many subspaces K. 

Since the £i-ball works well because of its geometry, one might think to then 
use an ip ball for some < p < 1. The geometry of such a ball would of course 
lend itself even better to sparsity. Indeed, some work in compressed sensing has 
used this approach (see e.g. [28[ [IZ]), however, recovery using such a method has 
not yet provided optimal results. The program (I2.1.ip has the advantage over those 
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with p < 1 because linear programming can be used to solve it. See Section 12.1.51 
for a discussion on linear programming. Basis Pursuit utilizes the geometry of the 
octahedron to recover the sparse signal x using measurement matrices $ that satisfy 
a deterministic property. 




Figure 2.1.1: The minimizers to the mean square (left) and ii (right) approaches. 
2.1.2 Restricted Isometry Condition 

As discussed above, to guarantee exact recovery of every s-sparse signal, the measure- 
ment matrix $ needs to be one-to-one on all 2s-sparse vectors. Candes and Tao [6] 
showed that under a slightly stronger condition. Basis Pursuit can recover every s- 
sparse signal by solving fl2.1.ip . To this end, we say that the restricted isometry 
condition (RIC) holds with parameters (r, 6) if 

(1 - S)\\x\\2 < \\<^X\\2 < (1 + S)\\x\\2 (2.1.3) 

holds for all r-sparse vectors x. Often, the quadratic form 

{1 - S)\\x\\l < \\'i>x\\l < (1 + 6)\\x\\l (2.1.4) 
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is used for simplicity. Often the notation 6r is used to denote the smallest 6 for 
which the above holds for all r-sparse signals. Now if we require 6 to be small, this 
condition essentially means that every subset of r or fewer columns of $ is approxi- 
mately an orthonormal system. Note that if the restricted isometry condition holds 
with parameters (2s, 1), then $ must be one-to-one on all s-sparse signals. Indeed, if 
$a; = $2; for two s-sparse vectors x and z, then — z) = 0, so by the left inequality, 
ll^; — -2||2 = 0. To use this restricted isometry condition in practice, we of course need 
to determine what kinds of matrices have small restricted isometry constants, and 
how many measurements are needed. Although it is quite difficult to check whether 
a given matrix satisfies this condition, it has been shown that many matrices satisfy 
the restricted isometry condition with high probability and few measurements. In 
particular, it has been show that with exponentially high probability, random Gaus- 
sian, Bernoulli, and partial Fourier matrices satisfy the restricted isometry condition 
with number of measurements nearly linear in the sparsity level. 

Subgaussian matrices: A random variable X is subgaussian if P(|X| > t) < Ce~'^^^ 
for all t > and some positive constants C, c. Thus subgaussian random 
variables have tail distributions that are dominated by that of the standard 
Gaussian random variable. Choosing C = c = 1, we trivially have that stan- 
dard Gaussian matrices (those whose entries are Gaussian) are subgaussian. 
Choosing C = ^ and c = 1, we see that Bernoulli matrices (those whose entries 
are uniform ±1) are also subgaussian. More generally, any bounded random 
variable is subgaussian. The following theorem proven in [18] shows that any 
subgaussian measurement matrix satisfies the restricted isometry condition with 
number of measurements m nearly linear in the sparsity s. 

Theorem 2.1.1 (Subgaussian measurement matrices). Let $ be a m x d sub- 
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gaussian measurement matrix, and let s > 1, < 6 < 1, and < e < 0.5. Then 
with probability 1 — a the matrix satisfies the restricted isometry condition 
with parameters {s,e) provided that the number of measurements m satisfies 

Cs ^ yd. 

where C depends only on a and other constants in the definition of subgaussian 
(for details on the dependence, see fJS^ ). 

Partial bounded orthogonal matrices: Let \1/ be an orthogonal d x d matrix 
whose entries are bounded by C/\/d for some constant C. A m x d partial 
bounded orthogonal matrix is a matrix $ formed by choosing m rows of such 
a matrix \& uniformly at random. Since the d x d discrete Fourier transform 
matrix is orthogonal with entries bounded by 1/ \fd, the m x d random partial 
Fourier matrix is a partial bounded orthogonal matrix. The following theorem 
proved in [52] shows that such matrices satisfy the restricted isometry condition 
with number of measurements m nearly linear in the sparsity s. 

Theorem 2.1.2 (Partial bounded orthogonal measurement matrices). Let $ 

be a m X d partial bounded orthogonal measurement matrix, and let s > 1, 
< 5 < 1, and < e < 0.5. Then with probability 1 — a the matrix 
satisfies the restricted isometry condition with parameters {s, e) provided that 
the number of measurements m satisfies 

.slogd. ,s\ogd. 2> 
m > C{^^) log (^^) log d, 

where C depends only on the confidence level a and other constants in the 
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definition of partial bounded orthogonal matrix (for details on the dependence, 
see JSS). 

2.1.3 Main Theorems 

Candes and Tao showed in [6] that for measurement matrices that satisfy the re- 
stricted isometry condition, Basis Pursuit recovers all sparse signals exactly. This is 
summarized in the following theorem. 

Theorem 2.1.3 (Sparse recovery under RIC [6]). Assume that the measurement 
matrix $ satisfies the restricted isometry condition with parameters (3s, 0.2). Then 
every s-sparse vector x can be exactly recovered from its measurements u = $x as a 
unique solution to the linear optimization problem fl2.1.ip . 

Note that these guarantees are uniform. Once the measurement matrix $ satisfies 
the restricted isometry condition. Basis Pursuit correctly recovers all sparse vectors. 

As discussed earlier, exactly sparse vectors are not encountered in practice, but 
rather nearly sparse signals. The signals and measurements are also noisy in practice, 
so practitioners seek algorithms that perform well under these conditions. Candes, 
Romberg and Tao showed in [5j that a version of Basis Pursuit indeed approximately 
recovers signals contaminated with noise. It is clear that in the noisy case, (12.1.11) is 
not a suitable method since the exact equality in the measurements would be most 
likely unattainable. Thus the method is modified slightly to allow for small pertur- 
bations, searching over all signals consistent with the measurement data. Instead of 
(]2.1.ip . we consider the formulation 



(2.1.5) 
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Candes, Romberg and Tao showed that the program fl2.1.5p reconstructs the signal 
with error at most proportional to the noise level. First we consider exactly sparse 
signals whose measurements are corrupted with noise. In this case, we have the 
following results from [5]. 

Theorem 2.1.4 (Stability of BP [5]). Let ^ be a measurement matrix satisfying the 
restricted isometry condition with parameters (3s, 0.2). Then for any s-sparse signal 
X and corrupted measurements u = $x + e with ||e||2 < £, the solution x to (12.1.51) 
satisfies 

\\x - x\\2 < Cs- e, 

where Cg depends only on the RIC constant 6. 

Note that in the noiseless case, this theorem is consistent with Theorem 12.1.31 
Theorem l2.1.4l is quite surprising given the fact that the matrix $ is a wide rectangular 
matrix. Since it has far more columns than rows, most of the singular values of $ 
are zero. Thus this theorem states that even though the problem is very ill-posed. 
Basis Pursuit still controls the error. It is important to point out that Theorem 12. 1.41 
is fundamentally optimal. This means that the error level e is in a strong sense 
unrecoverable. Indeed, suppose that the support S of x was known a priori. The best 
way to reconstruct x from the measurements u = $x + e in this case would be to 
apply the pseudoinverse $^ = ($^^$5)"^$^ on the support, and set the remaining 
coordinates to zero. That is, one would reconstruct x as 



Since the singular values of $5 are controlled, the error on the support is approxi- 




elsewhere 



on S 
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mately ||e||2 < s, and the error off the support is of course zero. This is also the error 
guaranteed by Theorem I2.1.4[ Thus no recovery algorithm can hope to recover with 
less error than the original error introduced to the measurements. 

Thus Basis Pursuit is stable to perturbations in the measurements of exactly 
sparse vectors. This extends naturally to the approximate recovery of nearly sparse 
signals, which is summarized in the companion theorem from |5]. 

Theorem 2.1.5 (Stability of BP II [5J). Let ^ be a measurement matrix satisfying 
the restricted isometry condition with parameters (3s, 0.2). Then for any arbitrary 
signal and corrupted measurements u = ^x+e with \\e\\2 < e, the solution x to (12.1.51) 
satisfies 



where Xs denotes the vector of the largest coefficients in magnitude of x. 

Remark 2.1.6. In 18], Candes sharpened Theorems \2.1.3[. \2.1.4l and \2.1.'S[ to work 
under the restricted isometry condition with parameters (2s, \/2 — 1). 

Theorem 12.1.51 says that for an arbitrary signal x, Basis Pursuit approximately 
recovers its largest s coefficients. In the particularly useful case of compressible 
signals, we have that for signals x obeying (II. 1.11) . the reconstruction satisfies 



where C depends on the RIC constant and the constant R in the compressibility 
definition eqrefcomp. We notice that for such signals we also have 



x-x\\2 <Cs-e + C'^- 



X Xg II \ 




\\x-xh<Cs-e + C's-'^+'/', 



(2.1.6) 



\\Xs -X\\2 < CrS 



q+1/2 
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where Cr depends on R. Thus the error in the approximation guaranteed by Theo- 
rem 12.1.51 is comparable to the error obtained by simply selecting the s largest coef- 
ficients in magnitude of a compressible signal. So at least in the case of compressible 
signals, the error guarantees are again optimal. See Section [2.1.61 for a discussion of 
advantages and disadvantages to this approach. 

2.1.4 Numerical Results 

Many empirical studies have been conducted using Basis Pursuit. Several are in- 
cluded here, other results can be found in [5l[22l[6]. The studies discussed here were 
performed in Matlab, with the help of £i-Magic code by Romberg Jl5j. The code 
is given in Appendix lA.ll In all cases here, the measurement matrix is a Gaussian 
matrix and the ambient dimension d is 256. In the first study, for each trial we 
generated binary signals with support uniformly selected at random as well as an 
independent Gaussian matrix for many values of the sparsity s and number of mea- 
surements m. Then we ran Basis Pursuit on the measurements of that signal and 
counted the number of times the signal was recovered correctly out of 500 trials. The 
results are displayed in Figure 12.1.21 The 99% recovery trend is depicted in Fig- 
ure 12.1.31 This curve shows the relationship between the number of measurements m 
and the sparsity level s to guarantee that correct recovery occurs 99% of the time. 
Note that by recovery, we mean that the estimation error falls below the threshold of 
10^^. Figure [2. 1.41 depicts the recovery error of Basis Pursuit when the measurements 
were perturbed. For this simulation, the signals were again binary (fiat) signals, but 
Gaussian noise was added to the measurements. The norm of the noise was chosen 
to be the constant 1/2. The last figure. Figure [2.1.51 displays the recovery error when 
Basis Pursuit is run on compressible signals. For this study, the Basis Pursuit was run 
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on signals whose coefficients obeyed the power law fll.l.ip . This simulation was run 
with sparsity s = 12, dimension d = 256, and for various values of the compressibility 
constant q. Note that the smaller q is, the more quickly the coefficients decay. 
Percentage of flat signals recovered with BP, d=256 




100 150 
Measurements m 



250 



Figure 2.1.2: The percentage of sparse flat signals exactly recovered by Basis Pursuit 
as a function of the number of measurements m in dimension d = 256 for various 
levels of sparsity s. 



2.1.5 Linear Programming 

Linear programming is a technique for optimization of a linear objective function 
under equality and inequality constraints, all of which are linear [TJ]. The prob- 
lem ( 12.1.11) can be recast as a linear program whose objective function (to be mini- 
mized) is 

d 



with constraints 



1=1 



-ti < -2i < ti, = U. 
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99% Recovery Trend with BP, d=256 
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Figure 2.1.3: The 99% recovery trend of Basis Pursuit as a function of the number 
of measurements m in dimension d = 256 for various levels of sparsity s. 
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Figure 2.1.4: The recovery error of Basis Pursuit under perturbed measurements as 
a function of the number of measurements m in dimension d = 256 for various levels 
of sparsity s. 
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Normalized Recovery error on compressible signals with BP, d=256, q=0.6 
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Figure 2.1.5: The normalized recovery error ii!^7^^^^ii of Basis Pursuit for compress- 
ible signals as a function of the number of measurements m in dimension d = 256 
with compressibility q = 0.6 for various levels of sparsity s. 

Viewed geometrically, the set of linear constraints, making up the feasible region, 
forms a convex polyhedron. By the Karush-Kuhn- Tucker conditions [H], all local 
optima are also global optima. If an optimum exists, it will be attained at a vertex 
of the polyhedron. There are several methods to search for this optimal vertex. 

One of the most popular algorithms in linear programming is the simplex algo- 
rithm, developed by George Dantzig [15]. The simplex method begins with some 
admissible starting solution. If such a point is not known, a different linear program 
(with an obvious admissible solution) can be solved via the simplex method to find 
such a point. The simplex method then traverses the edges of the polytope via a 
sequence of pivot steps. The algorithm moves along the polytope, at each step choos- 
ing the optimal direction, until the optimum is found. Assuming that precautions 
against cycling are taken, the algorithm is guaranteed to find the optimum. Although 
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it's worst-case behavior is exponential in the problem size, it is much more efficient 
in practice. Smoothed analysis has explained this efficiency in practice [65j, but it is 
still unknown whether there is a strongly polynomial bound on the runtime. 

The simplex algorithm traverses the polytope's edges, but an alternative method 
called the interior point method traverses the interior of the polytope [56]. One 
such method was proposed by Karmarkar and is an interior point projective method. 
Recently, barrier function or path-following methods are being used for practical 
purposes. The best bound currently attained on the runtime of an interior point 
method is 0{m'^d^-^). Other methods have been proposed, including the ellipsoid 
method by Khachiyan which has a polynomial worst case runtime, but as of yet none 
have provided strongly polynomial bounds. 

2.1.6 Summary 

Basis Pursuit presents many advantages over other algorithms in compressed sensing. 
Once a measurement matrix satisfies the restricted isometry condition. Basis Pursuit 
reconstructs all sparse signals. The guarantees it provides are thus uniform, meaning 
the algorithm will not fail for any sparse signal. Theorem 12.1.51 shows that Basis 
Pursuit is also stable, which is a necessity in practice. Its ability to handle noise and 
non-exactness of sparse signals makes the algorithm applicable to real world problems. 
The requirements on the restricted isometry constant shown in Theorem 12.1.51 along 
with the known results about random matrices discussed in Sect ion [2 . 1 . 21 mean that 
Basis Pursuit only requires 0{slogd) measurements to reconstruct ci-dimensional 
s-sparse signals. It is thought by many that this is the optimal number of measure- 
ments. 

Although Basis Pursuit provides these strong guarantees, its disadvantage is of 
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course speed. It relies on Linear Programming which although is often quite efficient 
in practice, has a polynomial runtime. For this reason, much work in compressed 
sensing has been done using faster methods. This approach uses greedy algorithms, 
which are discussed next. 

2.2 Greedy Methods 

An alternative approach to compressed sensing is the use of greedy algorithms. 
Greedy algorithms compute the support of the sparse signal x iteratively. Once 
the support of the signal is compute correctly, the pseudo-inverse of the measure- 
ment matrix restricted to the corresponding columns can be used to reconstruct the 
actual signal x. The clear advantage to this approach is speed, but the approach also 
presents new challenges. 

2.2.1 Orthogonal Matching Pursuit 

One such greedy algorithm is Orthogonal Matching Pursuit (OMP), put forth by 
Mallat and his collaborators (see e.g. [17]) and analyzed by Gilbert and Tropp [62] . 
OMP uses subGaussian measurement matrices to reconstruct sparse signals. If $ 
is such a measurement matrix, then $*$ is in a loose sense close to the identity. 
Therefore one would expect the largest coordinate of the observation vector y = 
to correspond to a non-zero entry of x. Thus one coordinate for the support of the 
signal X is estimated. Subtracting off that contribution from the observation vector 
y and repeating eventually yields the entire support of the signal x. OMP is quite 
fast, both in theory and in practice, but its guarantees are not as strong as those of 
Basis Pursuit. 
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Description 

The OMP algorithm can thus be described as follows: 



Orthogonal Matching Pursuit (OMP) 



Input: Measurement matrix "l>, measurement vector u 


— ^x, sparsity level s 


Output: Index set 7 c {1, ... , d} 




Procedure: 




Initialize Let the index set 7 and the residual r = 


- u. 


Repeat the following s times: 




Identify Select the largest coordinate A of y = $*r in absolute value. Break 


ties lexicographically. 




Update Add the coordinate A to the index set: / <— 


I U {A}, and update the 


residual: 




X = argmin ||m — $ 72;||2; r — 

z 


u — 



Once the support I of the signal x is found, the estimate can be reconstructed as 

X = ^\u, where recall we define the pseudoinvcrsc by $| = (<l>j<l>/)~^$j. 



The algorithm's simplicity enables a fast runtime. The algorithm iterates s times, 
and each iteration does a selection through d elements, multiplies by and solves 
a least squares problem. The selection can easily be done in 0{d) time, and the 
multiplication of in the general case takes 0{md). When $ is an unstructured 
matrix, the cost of solving the least squares problem is 0{s'^d). However, maintaining 
a QR- Factorization of $|/ and using the modified Gram-Schmidt algorithm reduces 
this time to 0{\I\d) at each iteration. Using this method, the overall cost of OMP 
becomes 0{smd). In the case where the measurement matrix $ is structured with a 



2.2. Greedy Methods 



25 



fast-multiply, this can clearly be improved. 
Main Theorems and Results 

Gilbert and Tropp showed that OMP correctly recovers a fixed sparse signal with 
high probability. Indeed, in [62| they prove the following. 

Theorem 2.2.1 (OMP Signal Recovery [62]). Fix 6 e (0, 0.36) and let $ be anmxd 
Gaussian measurement matrix with m > Cmlog{d/6) . Let x he an s-sparse signal in 
W^. Then with probability exceeding 1 — 25, OMP correctly reconstructs the signal x 
from its measurements ^x. 

Similar results hold when $ is a subgaussian matrix. We note here that although 
the measurement requirements are similar to those of Basis Pursuit, the guarantees 
are not uniform. The probability is for a fixed signal rather than for all signals. The 
type of measurement matrix here is also more restrictive, and it is unknown whether 
OMP works for the important case of random Fourier matrices. 

Numerical Experiments 

Many empirical studies have been conducted to study the success of OMP. One 
study is described here that demonstrates the relationship between the sparsity level 
s and the number of measurements m. Other results can be found in p2]. The study 
discussed here was performed in Matlab, and is given in Appendix IA.2[ . In the study, 
for each trial I generated binary signals with support uniformly selected at random 
as well as an independent Gaussian measurement matrix, for many values of the 
sparsity s and number of measurements m. Then I ran OMP on the measurements 
of that signal and counted the number of times the signal was recovered correctly out 
of 500 trials. The results are displayed in Figure 12.2.11 The 99% recovery trend is 
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depicted in Figure I2.2.2I This curve shows the relationship between the number of 
measurements m and the sparsity level s to guarantee that correct recovery occurs 
99% of the time. In comparison with Figures [2. 1.21 and 12. 1.31 we see that Basis Pursuit 
appears to provide stronger results empirically as well. 
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Figure 2.2.1: The percentage of sparse flat signals exactly recovered by OMP as a 
function of the number of measurements m in dimension d = 256 for various levels 
of sparsity s. 



Summary 

It is important to note the distinctions between this theorem for OMP and Theo- 
rem 12.1.31 for Basis Pursuit. The first important difference is that Theorem 12.2.11 
shows that OMP works only for the case when $ is a Gaussian (or subgaussian) 
matrices, whereas Theorem 12.1.31 holds for a more general class of matrices (those 
which satisfy the RIG). Also, Theorem 12.1.31 demonstrates that Basis Pursuit works 
correctly for all signals, once the measurement matrix satisfies the restricted isometry 
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99% Recovery Trend with OMP, d=256 




Measurements m 

Figure 2.2.2: The 99% recovery trend of OMP as a function of the number of mea- 
surements m in dimension d = 256 for various levels of sparsity s. 

condition. Theorem 12.2.11 shows only that OMP works with high probability for each 
fixed signal. The advantage to OMP however, is that its runtime has a much faster 
bound than that of Basis Pursuit and Linear Programming. 

2.2.2 Stagewise Orthogonal Matching Pursuit 

An alternative greedy approach, Stagewise Orthogonal Matching Pursuit (StOMP) 
developed and analyzed by Donoho and his collaborators [23], uses ideas inspired by 
wireless communications. As in OMP, StOMP utilizes the observation vector y = 
where u = is the measurement vector. However, instead of simply selecting the 
largest component of the vector y, it selects all of the coordinates whose values are 
above a specified threshold. It then solves a least-squares problem to update the 
residual. The algorithm iterates through only a fixed number of stages and then 
terminates, whereas OMP requires s iterations where s is the sparsity level. 
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Description 

The pseudo-code for StOMP can thus be described by the following. 

Stagewise Orthogonal Matching Pursuit (StOMP) 
Input: Measurement matrix $, measurement vector u = $x, 

Output: Estimate x to the signal x 

Procedure: 

Initialize Let the index set / = 0, the estimate x = 0, and the residual r = u. 
Repeat the following until stopping condition holds: 

Identify Using the observation vector y = $*r, set 

J = {j ■ IVjl > ^o-fc}, 

where ak is a formal noise level and is a threshold parameter for iteration 
k. 

Update Add the set J to the index set: / ^ / U J, and update the residual and 
estimate: 

x\j = {^*j^iy^^*jU, r = u-^x. 

The thresholding strategy is designed so that many terms enter at each stage, 
and so that algorithm halts after a fixed number of iterations. The formal noise level 
CTfc is proportional the Euclidean norm of the residual at that iteration. See [2^ for 
more information on the thresholding strategy. 
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Main Results 

Donoho and his collaborators studied StOMP empirically and have heuristically de- 
rived results. Figure 6 of [23] shows results of StOMP when the thresholding strategy 
is to control the false alarm rate and the measurement matrix $ is sampled from the 
uniform spherical ensemble. The false alarm rate is the number of incorrectly se- 
lected coordinates (ie. those that are not in the actual support, but are chosen in 
the estimate) divided by the number of coordinates not in the support of the signal 
X. The figure shows that for very sparse signals, the algorithm recovers a good ap- 
proximation to the signal. For less sparse signals the algorithm does not. The red 
curve in this figure is the graph of a heuristically derived function which they call 
the Predicted Phase transition. The simulation results and the predicted transition 
coincide reasonably well. This thresholding method requires knowledge about the 
actual sparsity level s of the signal x. Figure 7 of [23] shows similar results for a 
thresholding strategy that instead tries to control the false discovery rate. The false 
discovery rate is the fraction of incorrectly selected coordinates within the estimated 
support. This method appears to provide slightly weaker results. It appears however, 
that StOMP outperforms OMP and Basis Pursuit in some cases. 

Although the structure of StOMP is similar to that of OMP, because StOMP 
selects many coordinates at each state, the runtime is quite improved. Indeed, us- 
ing iterative methods to solve the least-squares problem yields a runtime bound of 
CNsd + 0{d), where is the fixed number of iterations run by StOMP, and C is a 
constant that depends only on the accuracy level of the least-squares problem. 
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Numerical Experiments 

A thorough empirical study of StOMP is provided in [23]. An additional study on 
StOMP was conducted here using a thresholding strategy with constant threshold 
parameter. The noise level a was proportional to the norm of the residual, as [23] 
suggests. StOMP was run with various sparsity levels and measurement numbers, 
with Gaussian measurement matrices for 500 trials. Figure 12.2.31 depicts the results, 
and Figure [2.2.41 depicts the 99% recovery trend. Next StOMP was run in this same 
way but noise was added to the measurements. Figure 12.2.51 displays the results of 
this study. Since the reconstructed signal is always sparse, it is not surprising that 
StOMP is able to overcome the noise level. Note that these empirical results are 
not optimal because of the basic thresholding strategy. See [23] for empirical results 
using an improved thresholding strategy. 
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Figure 2.2.3: The percentage of sparse flat signals exactly recovered by StOMP as a 
function of the number of measurements m in dimension d = 256 for various levels 
of sparsity s. 
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99% Recovery Trend with StOMP, d=256 
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Figure 2.2.4: The 99% recovery trend of StOMP as a function of the number of 
measurements m in dimension d = 256 for various levels of sparsity s. 
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Figure 2.2.5: The normalized recovery error ||a;— x||2/||e||2 of StOMP on sparse signals 
with noisy measurements $a; + e. 
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Summary 

The empirical results of StOMP in [23] are quite promising, and suggest its improve- 
ment over OMP. However, in practice, the thresholding strategy may be difficult and 
complicated to implement well. More importantly, there are no rigorous results for 
StOMP available. In the next subsection other greedy methods are discussed with 
rigorous results, but that require highly structured measurement matrices. 

2.2.3 Combinatorial Methods 

The major beneffi of the greedy approach is its speed, both empirically and the- 
oretically. There is a group of combinatorial algorithms that provide even faster 
speed, but that impose very strict requirements on the measurement matrix. These 
methods use highly structured measurement matrices that support very fast recon- 
struction through group testing. The work in this area includes HHS pursuit |32] . 
chaining pursuit [31j, Sudocodes [60], Fourier sampling [331 EH] and some others by 
Cormode-Muthukrishnan [12] and Iwen [ID]. 

Descriptions and Results 

Many of the sublinear algorithms such as HHS pursuit, chaining pursuit and Su- 
docodes employ the idea of group testing. Group testing is a method which origi- 
nated in the Selective Service during World War II to test soldiers for Syphilis [21], 
and now it appears in many experimental designs and other algorithms. During this 
time, the Wassermann test [66] was used to detect the Syphilis antigen in a blood 
sample. Since this test was expensive, the method was to sample a group of men 
together and test the entire pool of blood samples. If the pool did not contain the 
antigen, then one test replaced many. If it was found, then the process could either 
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be repeated with that group, or each individual in the group could then be tested. 

These sublinear algorithms in compressed sensing use this same idea to test for 
elements of the support of the signal x. Chaining pursuit, for example, uses a mea- 
surement matrix consisting of a row tensor product of a bit test matrix and an 
isolation matrix, both of which are 0-1 matrices. Chaining pursuit first uses bit tests 
to locate the positions of the large components of the signal x and estimate those 
values. Then the algorithm retains a portion of the coordinates that are largest 
magnitude and repeats. In the end, those coordinates which appeared throughout 
a large portion of the iterations are kept, and the signal is estimated using these. 
Pseudo-code is available in [31] , where the following result is proved. 

Theorem 2.2.2 (Chaining pursuit With probability at least 1 — 0{d~^), the 

0{s\og^ d) X d random measurement operator $ has the following property. For 
X G M*^ and its measurements u = ^x, the Chaining Pursuit algorithm produces a 
signal x with at most s nonzero entries. The output x satisfies 



HHS Pursuit, a similar algorithm but with improved guarantees, uses a mea- 
surement matrix that consists again of two parts. The first part is an identification 
matrix, and the second is an estimation matrix. As the names suggest, the identifi- 
cation matrix is used to identify the location of the large components of the signal, 
whereas the estimation matrix is used to estimate the values at those locations. Each 
of these matrices consist of smaller parts, some deterministic and some random. Us- 
ing this measurement matrix to locate large components and estimate their values. 



X — < C(l + logs)||x — 
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HHS Pursuit then adds the new estimate to the previous, and prunes it relative to 
the sparsity level. This estimation is itself then sampled, and the residual of the 
signal is updated. See [32] for the pseudo-code of the algorithm. Although the mea- 
surement matrix is highly structured, again a disadvantage in practice, the results 
for the algorithm are quite strong. Indeed, in |32j the following result is proved. 




Theorem 2.2.3 (HHS Pursuit [32j). Fix an integer s and a number e G (0, 1). With 
probability at least 0.99, the random measurement matrix ^ (as described above) has 
the following property. Let x E M.'^ and let u = $x be the measurement vector. 
The HHS Pursuit algorithm produces a signal approximation x with O^sje^) nonzero 
entries. The approximation satisfies 



where again Xg denotes the vector consisting of the s largest entries in magnitude 
of X. The number of measurements m is proportional to {s/e^) polylog{d / e) , and 
HHS Pursuit runs in time {s'^ /e^)polylog{d/e) . The algorithm uses working space 
{s / £'^)polylog{d / e) , including storage of the matrix $. 

Remark 2.2.4. This theorem presents guarantees that are stronger than those of 
chaining pursuit. Chaining pursuit, however, still provides a faster runtime. 

There are other algorithms such as the Sudocodes algorithm that as of now only 
work in the noiseless, strictly sparse case. However, these are still interesting because 
of the simplicity of the algorithm. The Sudocodes algorithm is a simple two-phase 
algorithm. In the first phase, an easily implemented avalanche bit testing scheme 
is applied iteratively to recover most of the coordinates of the signal x. At this 
point, it remains to reconstruct an extremely low dimensional signal (one whose 




\\X - X\\2 < 



e 



X X s II 



2.2. Greedy Methods 35 

coordinates are only those that remain). In the second phase, this part of the signal 
is reconstructed, which completes the reconstruction. Since the recovery is two- 
phase, the measurement matrix is as well. For the first phase, it must contain a 
sparse submatrix, one consisting of many zeros and few ones in each row. For the 
second phase, it also contains a matrix whose small submatrices are invertible. The 
following result for strictly sparse signals is proved in [60J. 

Theorem 2.2.5 (Sudocodes [60]). Let x he an s-sparse signal in W^, and let the 

m X d measurement matrix $ be as described above. Then with m = 0{slogd), the 
Sudocodes algorithm exactly reconstructs the signal x with computational complexity 
just 0{s log s log d) . 

The Sudocodes algorithm cannot reconstruct noisy signals because of the lack of 
robustness in the second phase. However, work on modifying this phase to handle 
noise is currently being done. If this task is accomplished Sudocodes would be an 
attractive algorithm because of its sublinear runtime and simple implementation. 

Summary 

Combinatorial algorithms such as HHS pursuit provide sublinear time recovery with 
optimal error bounds and optimal number of measurements. Some of these are 
straightforward and easy to implement, and others require complicated structures. 
The major disadvantage however is the structural requirement on the measurement 
matrices. Not only do these methods only work with one particular kind of measure- 
ment matrix, but that matrix is highly structured which limits its use in practice. 
There are no known sublinear methods in compressed sensing that allow for unstruc- 
tured or generic measurement matrices. 
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Chapter 3 
Contributions 

3.1 Regularized Orthogonal Matching Pursuit 

As is now evident, the two approaches to compressed sensing each presented disjoint 
advantages and challenges. While the optimization method provides robustness and 
uniform guarantees, it lacks the speed of the greedy approach. The greedy methods 
on the other hand had not been able to provide the strong guarantees of Basis Pur- 
suit. This changed when we developed a new greedy algorithm. Regularized Orthog- 
onal Matching Pursuit |55j, that provided the strong guarantees of the optimization 
method. This work bridged the gap between the two approaches, and provided the 
first algorithm possessing the advantages of both approaches. 

3.1.1 Description 

Regularized Orthogonal Matching Pursuit (ROMP) is a greedy algorithm, but will 
correctly recover any sparse signal using any measurement matrix that satisfies the 
Restricted Isometry Condition (12.1.31) . Again as in the case of OMP, we will use 
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the observation vector as a good local approximation to the s-sparse signal 

X. Since the Restricted Isometry Condition guarantees that every s columns of $ 
are close to an orthonormal system, we will choose at each iteration not just one 
coordinate as in OMP, but up to s coordinates using the observation vector. It will 
then be okay to choose some incorrect coordinates, so long as the number of those 
is limited. To ensure that we do not select too many incorrect coordinates at each 
iteration, we include a regularization step which will guarantee that each coordinate 
selected contains an even share of the information about the signal. The ROMP 
algorithm can thus be described as follows: 
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Regularized Orthogonal Matching Pursuit (ROMP) j55] 

Input: Measurement matrix $, measurement vector u = $x, sparsity level s 

Output: Index set / C {!,..., d}, reconstructed vector x = w 

Procedure: 

Initialize Let the index set / = and the residual r = u. 
Repeat the following steps until r = 0: 

Identify Choose a set J of the s biggest coordinates in magnitude of the obser- 
vation vector y = <I>*r, or all of its nonzero coordinates, whichever set is 
smaller. 

Regularize Among all subsets Jo C J with comparable coordinates: 



choose Jo with the maximal energy ||?/|jq||2- 
Update Add the set Jo to the index set: / ^ / U Jo, and update the residual: 



Remarks. 

1. We remark here that knowledge about the sparsity level s is required in 
ROMP, as in OMP. There are several ways this information may be obtained. Since 
the number of measurements m is usually chosen to be 0{s log d), one may then 
estimate the sparsity level s to be roughly m/ logrf. An alternative approach would 
be to run ROMP using various sparsity levels and choose the one which yields the 



y{i)\ < 2|i/(j)l for all i,j e Jo, 



w = argmin \\u — ^z\\2; 



r = u — ^w. 
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least error \\^x — ^x\\ for outputs x. Choosing testing levels out of a geometric 
progression, for example, would not contribute significantly to the overall runtime. 

2. Clearly in the case where the signal is not exactly sparse and the signal and 
measurements are corrupted with noise, the algorithm as described above will never 
halt. Thus in the noisy case, we simply change the halting criteria by allowing the 
algorithm iterate at most s times, or until |/| > s. We show below that with this 
modification ROMP approximately reconstructs arbitrary signals. 

3.1.2 Main Theorems 

In this section we present the main theorems for ROMP. We prove these theorems 
in Section I3.1.3[ When the measurement matrix $ satisfied the Restricted Isometry 
Condition, ROMP exactly recovers all sparse signals. This is summarized in the 
following theorem from [55] . 

Theorem 3.1.1 (Exact sparse recovery via ROMP ^55j). Assume a measurement 
matrix $ satisfies the Restricted Isometry Condition with parameters {2s, e) fore = 
0.03/v^logs. Let x be an s-sparse vector in M.'^ with measurements u = $x. Then 
ROMP in at most s iterations outputs a set I such that 

supp(x) C / and \I\ < 2s. 

Remarks. 1. Theorem 13.1.11 shows that ROMP provides exact recovery of sparse 
signals. Using the index set J, one can compute the signal x from its measurements 
u = ^x as X = where $/ denotes the measurement matrix $ restricted to 

the columns indexed by /. 

2. Theorem 13.1.11 provides uniform guarantees of sparse recovery, meaning that 
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once the measurement matrix $ satisfies the Restricted Isometry Condition, ROMP 
recovers every sparse signal from its measurements. Uniform guarantees such as this 
are now known to be impossible for OMP [57], and finding a version of OMP providing 
uniform guarantees was previously an open problem [62] . Theorem 13.1.11 shows that 
ROMP solves this problem. 

3. Recall from Section 12.1.21 that random Gaussian, Bernoulli and partial 
Fourier matrices with number of measurements m almost linear in the sparsity s, 
satisfy the Restricted Isometry Condition. It is still unknown whether OMP works 
at all with partial Fourier measurements, but ROMP gives sparse recovery for these 
measurements, and with uniform guarantees. 

4. In Section [3.1.41 we explain how the identification and regularization steps of 
ROMP can easily be performed efficiently. In Section 13. 1.41 we show that the running 
time of ROMP is comparable to that of OMP in theory, and is better in practice. 

Theorem 13.1.11 shows ROMP works correctly for signals which are exactly sparse. 
However, as mentioned before, ROMP also performs well for signals and measure- 
ments which are corrupted with noise. This is an essential property for an algorithm 
to be realistically used in practice. The following theorem from [54J shows that 
ROMP approximately reconstructs sparse signals with noisy measurements. Corol- 
lary 13.1.31 shows that ROMP also approximately reconstructs arbitrary signals with 
noisy measurements. 

Theorem 3.1.2 (Stability of ROMP under measurement perturbations [M])- Let $ 

be a measurement matrix satisfying the Restricted Isometry Condition with parame- 
ters (4s, e) fore = 0.01/-\/logs. Let x & Mf^ be an s-sparse vector. Suppose that the 
measurement vector ^x becomes corrupted, so that we consider u = + e where e 
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is some error vector. Then ROMP produces a good approximation x to x: 



\\x — X II2 < 104v/log^||e||2. 



Corollary 3.1.3 (Stability of ROMP under signal perturbations [M])- Let ^ be a 

measurement matrix satisfying the Restricted Isometry Condition with parameters 
{Ss,e) for e = O.Ol/Vlogs. Consider an arbitrary vector x in M.'^. Suppose that the 
measurement vector $x becomes corrupted, so we consider u = $x + e where e is 
some error vector. Then ROMP produces a good approximation x to X2s-' 



Remarks. 

1. In the noiseless case, Theorem 13 . 1 . 21 coincides with Theorem 13.1.11 in showing 
exact recovery. 

2. Corollary 13.1.31 still holds (with only the constants changed) when the term 
X2s is replaced by X(i+s)s for any 6 > 0. This is evident by the proof of the corollary 
given below. 

2. Corollary 13. 1.31 also implies the following bound on the entire signal x: 




(3.1.1) 




(3.1.2) 
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|a; - a^lh < p - a;2s||2 + - X2s||2 

< 159Vlog2s(||e||2 + ^^tI^) + - ^s) ~ {x - x,), 

/z /,, ,, \\X — Xo iX \\X — 

< 159v/log2s e 2 + - ^ +^ ^ 

< 160v/log2s(||e||2 + 



X X g \ 



where the first inequality is the triangle inequality, the second uses Corollary 13.1.31 
and the identity x — X2s = {x — Xg) — (x — Xs)s, and third uses Lemma [3.1.191 below. 

3. The error bound for Basis Pursuit given in Theorem I2.1.5[ is similar except 
for the logarithmic factor. We again believe this to be an artifact of our proof, and 
our empirical results in Section 13.1.51 show that ROMP indeed provides much better 
results than the corollary suggests. 

4. In the case of noise with Basis Pursuit, the problem ( 12.1.5^ needs to be 
solved, which requires knowledge about the noise vector e. ROMP requires no such 
knowledge. 

5. If instead one wished to compute a 2s-sparse approximation to the signal, 
one may just retain the 2s largest coordinates of the reconstructed vector x. In this 
case Corollary 13.1.31 implies the following: 

Corollary 3.1.4. Assume a measurement matrix $ satisfies the Restricted Isometry 



Condition with parameters {8s, e) for e = 0.01/v^logs. Then for an arbitrary vector 



X m 



\x2s - X2s\\2 < 477A/log2s^||e||2 + 



1 3y Xj s 1 1 1 
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This corollary is proved in Section I3.1.3[ 

6. As noted earlier, a special class of signals are compressible signals (11.1.11) . and 
for these it is straightforward to see that fl3.1.2p gives us the following error bound 



As observed in [5], this bound is optimal (within the logarithmic factor), meaning no 
algorithm can perform fundamentally better. 

3.1.3 Proofs of Theorems 

In this section we include the proofs of Theorems 13.1.11 and 13.1.21 and Corollaries 13. 1.31 
and 13.1.41 The proofs presented here originally appeared in [55] and [5lj . 

Proof of Theorem 13.1.11 

We shall prove a stronger version of Theorem 13.1.11 which states that at every itera- 
tion of ROMP, at least 50% of the newly selected coordinates are from the support 
of the signal x. 

Theorem 3.1.5 (Iteration Invariant of ROMP). Assume $ satisfies the Restricted 
Isometry Condition with parameters {2s, e) for e = 0.03/v^log s. Let x be a 
s-sparse vector with measurements u = ^x. Then at any iteration of ROMP, after 
the regularization step, we have Jq 7^ 0? Jq H / = and 



for ROMP: 



X — x\\2 < C'^ 



1 sg-1/2 



+ C"^\^s\\e\\2. 



Jo n supp(x)| > -|Jo 



(3.1.3) 



In other words, at least 50% of the coordinates in the newly selected set Jq belong to 
the support of x. 
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In particular, at every iteration ROMP finds at least one new coordinate in the 
support of the signal x. Coordinates outside the support can also be found, but 
fl3.1.3p guarantees that the number of such "false" coordinates is always smaller than 
those in the support. This clearly implies Theorem 13. 1.1 [ 

Before proving Theorem 13.1.51 we explain how the Restricted Isometry Condition 
will be used in our argument. RIC is necessarily a local principle, which concerns not 
the measurement matrix $ as a whole, but its submatrices of s columns. All such 
submatrices $/, / C {1, . . . |/| < s are almost isometries. Therefore, for every 
s-sparse signal a;, the observation vector y = approximates x locally, when 

restricted to a set of cardinality s. The following proposition formalizes these local 
properties of $ on which our argument is based. 

Proposition 3.1.6 (Consequences of Restricted Isometry Condition). Assume a 
measurement matrix $ satisfies the Restricted Isometry Condition with parameters 
{2s, e). Then the following holds. 

1. (Local approximation) For every s-sparse vector a; G M'' and every set I C 
{1, . . . , d}, \I\ < s, the observation vector y = satisfies 

\\y\i - x\i\\2 < 2.03£:||a;||2. 

2. (Spectral norm) For any vector z G M™ and every set Id {1, . . . , d], \I\ < 2s, 
we have 

\\{^*z)Uh<{l+e)\\zh. 

3. (Almost orthogonality of columns) Consider two disjoint sets I, J C {1, . . . , d}, 
|/U J| < 2s. Let Pj, Pj denote the orthogonal projections in onto range($/) 
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and range{^j), respectively. Then 

\\PiPj\\2^2 < 2.2e. 

Proof. Part 1. Let F = 7 U supp(a:), so that |r| < 2s. Let Idr denote the identity 
operator on MJ^. By the Restricted Isometry Condition, 



|$r^r - Idr ||2^2 = sup | ||$rw||^ - y-u;!!^! < (1 + s)^ - 1 < 2.03s. 

weMF, \\w\\2=i 



Since supp(a;) C F, we have 



|y|r - x\r\\2 = ll^r^ra; - Idx||2 < 2.03£||x||2. 



The conclusion of Part 1 follows since / C F. 

Part 2. Denote by Qi the orthogonal projection in R*^ onto W . Since |/| < 2s, 
the Restricted Isometry Condition yields 

This yields the inequahty in Part 2. 

Part 3. The desired inequality is equivalent to: 

' ,, ', , ' < 2.2£ for all x e range($/), y G range($j). 

imuWyh 

Let K = I L) J so that \K\ < 2s. For any x e range(<l>7),7/ e range(<l>j), there are 
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a, b so that 

X = $^a, y = ^Kb, a G be . 
By the Restricted Isometry Condition, 



Fih > (1 - mah, hh > (1 - £^)ni2- 

By the proof of Part 2 above and since {ab) = 0, we have 

\{x,y)\ = |((<I>:^<I>K-Idr)a,&)| <2.03£||a||2||6||2. 

This yields 

K^-y)! < < 2.2., 

f||2||?/||2 (1-^)^ 

which completes the proof. □ 
We are now ready to prove Theorem 13.1.51 

The proof is by induction on the iteration of ROMP. The induction claim is that 
for all previous iterations, the set of newly chosen indices Jo is nonempty, disjoint 
from the set of previously chosen indices /, and fl3.1.3l) holds. 

Let / be the set of previously chosen indices at the start of a given iteration. The 
induction claim easily implies that 

|supp(a;) U /| < 2s. (3.1.4) 



Let Jo, J, be the sets found by ROMP in the current iteration. By the definition of 
the set Jo, it is nonempty. 
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Let r 7^ be the residual at the start of this iteration. We shall approximate r by 
a vector in range($supp(x)\/)- That is, we want to approximately realize the residual 
r as measurements of some signal which lives on the still unfound coordinates of the 
the support of x. To that end, we consider the subspace 

H := range(<l>supp(^)u/) 

and its complementary subspaces 

F := range($/), Eq := range(<l>supp(x)\/)- 

The Restricted Isometry Condition in the form of Part 3 of Proposition 13 . 1 . 61 ensures 
that F and Eq are almost orthogonal. Thus Eq is close to the orthogonal complement 
of F in H, 

E:= F^n H. 




We will also consider the signal we seek to identify at the current iteration, its 
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measurements, and its observation vector: 



xo := x\supp{x)\i, uo := ^Xq G Eq, yo ■= ^*Uo. 



(3.1.5) 



Lemma [3.1.91 will show that \\{y — ?/o)|t||2 for any small enough subset T is small, 
and Lemma [3.1.121 will show that ||?/|jo||2 is not too small. First, we show that the 
residual r has a simple description: 

Lemma 3.1.7 (Residual). Here and thereafter, let denote the orthogonal projec- 
tion in onto a linear subspace L. Then 



Proof. By definition of the residual in the algorithm, r = Pp±u. Since m G if, we 
conclude from the orthogonal decomposition H = F + E that u = Ppu + Peu. Thus 



To guarantee a correct identification of Xq, we first state two approximation lem- 
mas that refiect in two different ways the fact that subspaces Eq and E are close to 
each other. This will allow us to carry over information from Eq to E. 

Lemma 3.1.8 (Approximation of the residual). We have 



Proof. By definition of F, we have u—Uq = ^{x—Xq) G F. Therefore, by Lemma [3.1.7[ 

r = Peu = PeUq, and so 



r = Peu. 



r = u — Pfu = Peu. 



□ 



Uq - r\\2 < 2.2£:||?io||2- 



uo-r = Uo- PeUq = PfUq = PfPeqUo- 
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Now we use Part 3 of Proposition 13.1.61 for the sets / and supp(a;) \ I whose union 
has cardinality at most 2s by (13.1.41) . It follows that ||-Pf-Peo'"o||2 < 2.2£:||'Uo||2 as 
desired. □ 

Lemma 3.1.9 (Approximation of the observation). Consider the observation vectors 
Ho = ^*uq and y = $*r. Then for any set T C {1, . . . ,d} with \T\ < 2s, we have 

\\{yo - y)\T\\2 < 2.4£:||xo||2. 

Proof. Since uq = $xo, we have by Lemma [3. 1.8 1 and the Restricted Isometry Condi- 
tion that 

\\uo-r\\2 < 2.2£||$xo||2 < 2.2e(l + £)||xo||2 < 2.3e||xo||2. 

To complete the proof, it remains to apply Part 2 of Proposition 13.1.6^ which yields 

\\{yo - y)\Th < {I + e)\\uo - ry. □ 

We next show that the energy (norm) of y when restricted to J, and furthermore 
to Jq, is not too small. By the approximation lemmas, this will yield that ROMP 
selects at least a fixed percentage of energy of the still unidentified part of the signal. 
By the regularization step of ROMP, since all selected coefficients have comparable 
magnitudes, we will conclude that not only a portion of energy but also of the support 
is selected correctly. This will be the desired conclusion. 

Lemma 3.1.10 (Localizing the energy). We have ||y|j||2 > 0.8||xo||2- 

Proof. Let S = supp(a;) \ /. Since 15*1 < s, the maximality property of J in the 
algorithm implies that 

llz/oljlh > IboUlh- 
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Furthermore, since xo\s = xo, by Part 1 of Proposition 13. 1.61 we have 

\\yo\sh > (l-2.03£)||a;o||2. 
Putting these two inequalities together and using Lemma I3.1.9[ we conclude that 

||y|j||2 > (1 - 2.035) ||a:o||2 - 2.4£||xo||2 > 0.8||xo||2. 

This proves the lemma. □ 

We next bound the norm of y restricted to the smaller set Jq. We do this by first 
noticing a general property of regularization: 

Lemma 3.1.11 (Regularization). Let v be any vector in M™, m > 1. Then there 
exists a subset A <Z {I, . . . , m} with comparable coordinates: 

\v{i)\ < 2\v{j)\ foralli,jeA, (3.1.6) 



and with big energy: 



|^U||2 > „ - r. Ph. (3.1.7) 

z.ovlogm 



Proof. We will construct at most O(logm) subsets Ak with comparable coordinates 
as in fl3.1.6p . and such that at least one of these sets will have large energy as in 
fl3X7|) . 

Let V = (t>i, . . . , Vm), and consider a partition of {1, ... , m} using sets with com- 
parable coordinates: 

Ak := {t : < h\ < 2-''+'\\v\\2}, k = l,2,... 
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Let ko = [logm] + 1, so that \vi\ < ;;^||'y||2 for all i G Ak, k > k^. Then the set 
U = IJfc<fco contains most of the energy of v: 

Thus 

/ 2\l/2 /II ii2 2\l/2 ^ 1 II II 

( 2^ fUJIs) = Muh = [\\v\\2 - \\v\u42) >^\\n2- 



k<kQ 

Therefore there exists k < ko such that 



1 „ „ 1 „ „ 

\\v\aJ\2 > ^^IK'll2 > ^ . n Fib, 

y/zko 2.b^/logm 
which completes the proof. □ 

In our context, Lemma [3 . 1 . 1 1 1 applied to the vector y\j along with Lemma [3. 1.101 
directly implies: 

Lemma 3.1.12 (Regularizing the energy). We have 



I I II ^ 0-32 II 11 
Vlogs 



We now finish the proof of Theorem 13.1.51 

To show the first claim, that Jq is nonempty, we note that Xq ^ 0. Indeed, 
otherwise by (13.1.51) we have / C supp(x), so by the definition of the residual in 
the algorithm, we would have r = at the start of the current iteration, which is a 
contradiction. Then Jq 7^ by Lemma 13.1.121 

The second claim, that Jo fl / = 0, is also simple. Indeed, recall that by the 
definition of the algorithm, r = Pp± G F-^ = (range ($/))-*-. It follows that the 
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observation vector y = $*r satisfies y\i = 0. Since by its definition the set J contains 
only nonzero coordinates of y we have J fl / = 0. Since Jq C J, the second claim 
Jo n / = follows. 

The nontrivial part of the theorem is its last claim, inequality (I3.1.3p . Suppose it 
fails. Namely, suppose that | Jo H supp(x)| < ^| Jo|, and thus 

|Jo\supp(a;)| > ^|Jo|. 

Set A = Jo\supp(x). By the comparability property of the coordinates in Jq and 
since |A| > || Jo|, there is a fraction of energy in A: 

WvUh > 4f bljolh > — ^l|a;ol|2, (3.1.8) 
V5 7 V log s 

where the last inequality holds by Lemma I3.1.12[ 
On the other hand, we can approximate y by yo as 

IbUlb < WvIa - yoUlh + IboUlh- (3.1.9) 

Since A C J and using Lemma 13.1. 9[ we have 

WvIa - l/oUlb < 2.4£:||xo||2 

Furthermore, by definition (13.1.51) of a;o, we have xo|a = 0. So, by Part 1 of Proposi- 
tion [SXel 

||2/o|a||2 < 2.03£||a;o||2. 
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IbUlh < 4.43£:||a;o||2- 

This is a contradiction to fl3.1.8p so long as e < 0.03/v^log s. This proves Theo- 
rem [3]T31 □ 

Proof of Theorem 13.1.21 

The proof of Theorem 13.1.21 parallels that of Theorem 13.1.11 We begin by showing 
that at every iteration of ROMP, either at least 50% of the selected coordinates from 
that iteration are from the support of the actual signal v, or the error bound already 
holds. This directly implies Theorem 13.1.21 

Theorem 3.1.13 (Stable Iteration Invariant of ROMP). Let ^ be a measurement 
matrix satisfying the Restricted Isometry Condition with parameters (4s, e) for e = 
0.01/v^logs. Let X be a non-zero s-sparse vector with measurements u = $a; + e. 
Then at any iteration of ROMP, after the regularization step where I is the current 
chosen index set, we have Jo H / = and (at least) one of the following: 

(i) I Jo n supp(t;)| > i| Jol; 

(ii) ||a;|supp(x)\/||2 < 100Vlogs||e||2. 

We show that the Iteration Invariant implies Theorem 13.1.21 by examining the 
three possible cases: 

Case 1: (ii) occurs at some iteration. We first note that since |/| is nonde- 
creasing, if (ii) occurs at some iteration, then it holds for all subsequent iterations. 
To show that this would then imply Theorem l3.1.2[ we observe that by the Restricted 
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Isometry Condition and since |supp(a;)| < |/| < 3s, 

(1 — £)\\x — x\\2 — ||e||2 < W^x — ^x — e\\2- 

Then again by the Restricted Isometry Condition and definition of x, 

||$:E-$a;-e||2 < \mx\i) - <t>x - e\\2 < (1 + £)||x|,upp(^)\/||2 + ||e||2. 

Thus we have that 

l + e„ , „ 2 „ „ 
F - ^ 2 < F supp(x)\/ 2 + :; 6 2- 

Thus (ii) of the Iteration Invariant would imply Theorem 13.1.21 

Case 2: (i) occurs at every iteration and Jo is always non-empty. In this 

case, by (i) and the fact that Jo is always non-empty, the algorithm identifies at least 

one element of the support in every iteration. Thus if the algorithm runs s iterations 

or until |/| > 2s, it must be that supp(x) C /, meaning that x|supp(a;)\7 = 0. Then by 

the argument above for Case 1, this implies Theorem I3.1.2[ 

Case 3: (i) occurs at each iteration and Jo = for some iteration. By 

the definition of Jq, if Jq = then y = $*r = for that iteration. By definition of r, 

this must mean that 

$*<I>(x -w) + $*e = 0. 

This combined with Part 1 of Proposition [371. 61 below (and its proof, see |55j) applied 
with the set /' = supp(x) U / yields 

\\x -w + (<l>*e)|//||2 < 2.03£:||x - w\\2. 
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Then combinining this with Part 2 of the same Proposition, we have 

\\x — w\\2 < l.l||e||2. 

Since a;|supp(a;)\7' = {x — 'w)\supp(x)\i , this means that the error bound (ii) must hold, 
so by Case 1 this implies Theorem 13.1.21 

We now turn to the proof of the Iteration Invariant, Theorem 13.1.131 We prove 
Theorem 13.1.131 by inducting on each iteration of ROMP. We will show that at each 
iteration the set of chosen indices is disjoint from the current set / of indices, and 
that either (i) or (ii) holds. Clearly if (ii) held in a previous iteration, it would hold 
in all future iterations. Thus we may assume that (ii) has not yet held. Since (i) has 
held at each previous iteration, we must have 

|J|<2s. (3.1.10) 

Consider an iteration of ROMP, and let r 7^ be the residual at the start of that 
iteration. Let Jq and J be the sets found by ROMP in this iteration. As in [55], we 
consider the subspace 

H := range($supp(i,)u/) 
and its complementary subspaces 

F := range($/), Eq := range($suppW\7)- 



Part 3 of Proposition 13.1.61 states that the subspaces F and Eq are nearly orthogonal. 
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E:=F^n H. 

First we write the residual r in terms of projections onto these subspaces. 

Lemma 3.1.14 (Residual). Here and onward, denote by Pl the orthogonal projection 
in M™ onto a linear subspace L. Then the residual r has the following form: 

r = Pe^x + Pp±e. 

Proof. By definition of the residual r in the ROMP algorithm, r = Pp±u = Pp±($x + 
e). To complete the proof we need that Ppx$x = Pe^x. This follows from the 
orthogonal decomposition H = F + E and the fact that $a; E H. □ 

Next we examine the missing portion of the signal as well as its measurements: 

Xo ■= X\snpp{x)\l, UO ■= ^Xq G Eq. (3.1.11) 

In the next two lemmas we show that the subspaces E and Eq are indeed close. 

Lemma 3.1.15 (Approximation of the residual). Let r be the residual vector and Uq 
as in (13.1. lip . Then 

\\uo - r\\2 < 2.2£:||?io||2 + ||e||2- 

Proof. Since x — Xq has support in /, we have $a; — uq = $(x — Xq) G F. Then by 
Lemma [3.1.71 r = Pe^x + Ppxe = PeUq + Ppxe. Therefore, 

||a;o - r\\2 = \\xo - PeXo - Pp^ey < ||-Pf2;o||2 + ||e||2- 



3.1. Regularized Orthogonal Matching Pursuit 



57 



Note that by fl3.1.10p . the union of the sets / and supp(x) \ / has cardinality no 
greater than 3s. Thus by Part 3 of Prop osit ion 13 . 1 . 6| we have 

||-PfMo||2 + ||e||2 = ||-Pf-Peo^o||2 + ||e||2 < 2.2£||uo||2 + ||e||2. 

□ 

Lemma 3.1.16 (Approximation of the observation). Let yo = $*mo and y = ^*r. 
Then for any set T G {1, . . . ,d} with \T\ < 3s, 



~y)\Th < 2.4£||xo||2 + (l + £)||e||2. 
Proof. By Lemma 13.1.151 and the Restricted Isometry Condition we have 

||mo - r\\2 < 2.2e||$a;o||2 + ||e||2 < 2.2e(l + e)||xo||2 + ||e||2 < 2.3e||xo||2 + ||e||2. 
Then by Part 2 of Proposition 13 . 1 .61 we have the desired result, 

WiVo - y)\T\\2 < (1 + e)\\uo -r\\2. 



□ 



The result of the theorem requires us to show that we correctly gain a portion of 
the support of the signal x. To this end, we first show that ROMP correctly chooses 
a portion of the energy. The regularization step will then imply that the support is 
also selected correctly. We thus next show that the energy of y when restricted to 
the sets J and Jq is sufficiently large. 
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Lemma 3.1.17 (Localizing the energy). Let y be the observation vector and xq be 
asm dXTTT]) . Then jih > 0.8||xo||2 - (1 + ||e||2. 

Proof. Let S = supp(x) \ / be the missing support. Since \S\ < s, by definition of J 
in the algorithm, we have 

\\y\jh > hlsh- 

By Lemma [3.1.16[ 



||?/|5||2>||l/o|5||2-2.45||xo||2-(l+£)||e||2. 

Since xo\s = xq, Part 1 of Proposition 13. 1.61 implies 

||?/o|5||2>(l-2.03£)||xo||2. 

These three inequalities yield 

Ibl j||2 > (1 - 2.03£)||xo||2 - 2.4£||xol|2 - (1 + e)||e||2 > 0.8||a;o||2 - (1 + e)\\eh. 

This completes the proof. □ 

Lemma 3.1.18 (Regularizing the energy). Again lety be the observation vector and 
xq be as in (13.1. lip . Then 



\\y\joh>-r^\\xoh- 



Proof. By Lemma [3.1.111 apphed to the vector y\j, we have 



||y|jol|2 > „ ^ \ l|y|j||2- 

2.5vlogs 
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Along with Lemma 13.1.171 this implies the claim. 
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□ 



We now conclude the proof of Theorem I3.1.13[ The claim that Jq fl / = follows 
by the same arguments as in [55]. 

It remains to show its last claim, that either (i) or (ii) holds. Suppose (i) in the 
theorem fails. That is, suppose |Jo nsupp(x)| < h\Jo\, which means 



|Jo\supp(a;)| > ^|Jo|- 



Set A = Jo\supp(x). By the definition of Jq in the algorithm and since |A| > \\Jo\, 
we have by Lemma [3.1.181 

WyWh > 4f llz/ljolh > . \\xoh - o S — (3.1.12) 

V5 4v51ogs 2v51ogs 

Next, we also have 

llz/Ulb < WvIa - l/oUlh + IboUlb- (3.1.13) 
Since A C J and |J| < s, by Lemma 13.1.161 we have 

WvIa - Z/oUlb < 2.4£:||xo||2 + (1 + ^^)||e||2- 

By the definition of Xq in (13.1. lip , it must be that Xo|a = 0. Thus by Part 1 of 
Proposition 13.1.61 

Ill/oUlb < 2.03£:||2;o||2. 
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Using the previous inequalities along with (I3.1.13p . we deduce that 
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llz/Ulh <4.43£||a;o||2 + (l + £)||e||2. 

This is a contradiction to fl3.1.12p whenever 

/ 0.02 ||e||2 
e < 



Vlogs llxolh 

If this is true, then indeed (i) in the theorem must hold. If it is not true, then by the 
choice of e, this implies that 

||a;o||2 < 100 II e II 2 A/logs. 
This proves Theorem 13. 1.13[ □ 
Proof of Corollary 13X31 

Proof. We first partition x so that u = $X2s + $(x — X2s) + e. Then since $ satisfies 
the Restricted Isometry Condition with parameters (8s, e), by Theorem 13.1.21 and the 
triangle inequality. 



X2s - X 2 



< 104Vlog2s(||$(x - X2s)\\2 + ||e||2), (3.1.14) 



The following lemma as in [32] relates the 2-norm of a vector's tail to its 1-norm. An 
application of this lemma combined with (13.1.141) will prove Corollary 13.1.31 

Lemma 3.1.19 (Comparing the norms). Let v E W^, and let Vt be the vector of the 
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T largest coordinates in absolute value from v. Then 
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I II ^ 1 

\V — Vt\\2 < 



2Vt' 

Proof. By linearity, we may assume = d. Since vt consists of the largest T 



coordinates of v in absolute value, we must have that ||f — frib < y/d — T. (This is 
because the term H^; — frlh is greatest when the vector v has constant entries.) Then 
by the AM-GM inequality, 

\\v - VThVr < Vd-rVr <{d-T + T)/2 = d/2 = \\v\\i/2. 

This completes the proof. □ 
By Lemma 29 of [32j, we have 

\mx - x,,)h <{l + e) {\\x - x^sh + t^l^) . 

Applying Lemma 13.1.191 to the vector v = x — Xg we then have 



Combined with (I3.1.14p . this proves the corollary. 



□ 



Proof of Corollary 13.1.41 

Often one wishes to find a sparse approximation to a signal. We now show that by 
simply truncating the reconstructed vector, we obtain a 2s-sparse vector very close 
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to the original signal. 

Proof. Let xs '■= X2s and xt '■= X2s, and let 5* and T denote the supports of xs and 
Xt respectively. By Corollary 13. 1.31 it suffices to show that Hx^ — Xt'||2 < 3||x5 — x||2. 
Applying the triangle inequality, we have 

\\xs - xrh < \\{xs - xt)\t\\2 + \\xs\s\t\\2 =: a + b. 

We then have 

a = \\{xs - xt)|t||2 = \\{xs - x)\t\\2 < ll^^s - x\\2 

and 

h < P|s\t||2 + \\{xs - ^^)U\t||2- 

Since \S\ = \T\, we have |S'\T| = jTyS*!. By the definition of T, every coordinate of 
X in T is greater than or equal to every coordinate of x in T'^ in absolute value. Thus 
we have, 

||x|5\t||2 < PInsIb = Wixs - x)|t\5||2- 

Thus & < 2||x5 — a;||2, and so 

a + b < 3\\xs — x\\2. 

This completes the proof. □ 



Remark. Corollary 13.1.41 combined with Corollary 13.1.31 and f l3.1.2p implies that we 
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can also estimate a bound on the whole signal v: 



\x -a;2s||2 < C A/log 2s (^||e| 



1 1 X X 5 1 1 ]^ 
2 H 



3.1.4 Implementation and Runtime 

The Identification step of ROMP, i.e. selection of the subset J, can be done by sorting 
the coordinates of y in the nonincreasing order and selecting s biggest. Many sorting 
algorithms such as Mergesort or Heapsort provide running times of 0{d\ogd). 

The Regularization step of ROMP, i.e. selecting Jo C J, can be done fast by 
observing that Jq is an interval in the decreasing rearrangement of coefficients. More- 
over, the analysis of the algorithm shows that instead of searching over all intervals 
Jo, it suffices to look for Jq among 0(log s) consecutive intervals with endpoints where 
the magnitude of coefficients decreases by a factor of 2. (these are the sets Ak in the 
proof of Lemma [3.1. lip . Therefore, the Regularization step can be done in time 0{s). 

In addition to these costs, the /c-th iteration step of ROMP involves multiplication 
of the d X m matrix $* by a vector, and solving the least squares problem with the 
d X I J| matrix $/, where |/| < 2s. For unstructured matrices, these tasks can be done 
in time dm and 0{s'^m) respectively p]. Since the submatrix of $ when restricted to 
the index set / is near an isometry, using an iterative method such as the Conjugate 
Gradient Method allows us to solve the least squares method in a constant number of 
iterations (up to a specific accuracy) [21 Sec. 7.4]. Using such a method then reduces 
the time of solving the least squares problem to just 0{sm). Thus in the cases 
where ROMP terminates after a fixed number of iterations, the total time to solve 
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all required least squares problems would be just 0{sm). For structured matrices, 
such as partial Fourier, these times can be improved even more using fast multiply 
techniques. 

In other cases, however, ROMP may need more than a constant number of iter- 
ations before terminating, say the full 0{s) iterations. In this case, it may be more 
efficient to maintain the QR factorization of $/ and use the Modified Gram-Schmidt 
algorithm. With this method, solving all the least squares problems takes total time 
just O(s^m). However, storing the QR factorization is quite costly, so in situations 
where storage is limited it may be best to use the iterative methods mentioned above. 

ROMP terminates in at most 2s iterations. Therefore, for unstructured matrices 
using the methods mentioned above and in the interesting regime m > log d, the total 
running time of ROMP is O(dNn). This is the same bound as for OMP [62j . 

3.1.5 Numerical Results 

Noiseless Numerical Studies 

This section describes our experiments that illustrate the signal recovery power of 
ROMP, as shown in [5^ . See Section IA.3I for the Matlab code used in these studies. 
We experimentally examine how many measurements m are necessary to recover 
various kinds of s-sparse signals in using ROMP. We also demonstrate that the 
number of iterations ROMP needs to recover a sparse signal is in practice at most 
linear the sparsity. 

First we describe the setup of our experiments. For many values of the ambient 
dimension d, the number of measurements m, and the sparsity s, we reconstruct 
random signals using ROMP. For each set of values, we generate an m x d Gaussian 
measurement matrix $ and then perform 500 independent trials. The results we 
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obtained using Bernoulli measurement matrices were very similar. In a given trial, 
we generate an s-sparse signal x in one of two ways. In either case, we first select the 
support of the signal by choosing s components uniformly at random (independent 
from the measurement matrix $). In the cases where we wish to generate flat signals, 
we then set these components to one. Our work as well as the analysis of Gilbert and 
Tropp [62] show that this is a challenging case for ROMP (and OMP). In the cases 
where we wish to generate sparse compressible signals, we set the i^^ component of 
the support to plus or minus i'^^^ for a specified value of < p < 1. We then execute 
ROMP with the measurement vector u = $x. 

Figure 13.1.11 depicts the percentage (from the 500 trials) of sparse fiat signals that 
were reconstructed exactly. This plot was generated with d = 256 for various levels 
of sparsity s. The horizontal axis represents the number of measurements m, and 
the vertical axis represents the exact recovery percentage. We also performed this 
same test for sparse compressible signals and found the results very similar to those 
in Figure 13.1.11 Our results show that performance of ROMP is very similar to that 
of OMP which can be found in [62] . 

Figure 13.1.21 depicts a plot of the values for m and s at which 99% of sparse fiat 
signals are recovered exactly. This plot was generated with d = 256. The horizontal 
axis represents the number of measurements m, and the vertical axis the sparsity 
level s. 

Theorem 13.1.11 guarantees that ROMP runs with at most 0{s) iterations. Fig- 
ure 13.1.31 depicts the number of iterations executed by ROMP for d = 10, 000 and 
m = 200. ROMP was executed under the same setting as described above for sparse 
fiat signals as well as sparse compressible signals for various values of p, and the 
number of iterations in each scenario was averaged over the 500 trials. These aver- 
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ages were plotted against the sparsity of the signal. As the plot illustrates, only 2 
iterations were needed for flat signals even for sparsity s as high as 40. The plot also 
demonstrates that the number of iterations needed for sparse compressible is higher 
than the number needed for sparse fiat signals, as one would expect. The plot sug- 
gests that for smaller values of p (meaning signals that decay more rapidly) ROMP 
needs more iterations. However it shows that even in the case of p = 0.5, only 6 
iterations are needed even for sparsity s as high as 20. 



Percentage of input signals recovered exactly (d=256) (Gaussian) 
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Figure 3.1.1: The percentage of sparse fiat signals exactly recovered by ROMP as a 
function of the number of measurements in dimension d = 256 for various levels of 
sparsity. 



Noisy Numerical Studies 

This section describes our numerical experiments that illustrate the stability of ROMP 
as shown in [54]. We study the recovery error using ROMP for both perturbed mea- 
surements and signals. The empirical recovery error is actually much better than 
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99% Recovery Trend, d=256 
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Figure 3.1.2: The 99% recovery limit as a function of the sparsity and the number of 
measurements for sparse fiat signals. 

that given in the theorems. 

First we describe the setup to our experimental studies. We run ROMP on various 
values of the ambient dimension the number of measurements m, and the sparsity 
level s, and attempt to reconstruct random signals. For each set of parameters, we 
perform 500 trials. Initially, we generate an m x (i Gaussian measurement matrix 

For each trial, independent of the matrix, we generate an s-sparse signal x by 
choosing s components uniformly at random and setting them to one. In the case of 
perturbed signals, we add to the signal a ci-dimensional error vector with Gaussian 
entries. In the case of perturbed measurements, we add an m-dimensional error vector 
with Gaussian entries to the measurement vector $x. We then execute ROMP with 
the measurement vector u = $x or -u + e in the perturbed measurement case. After 
ROMP terminates, we output the reconstructed vector x obtained from the least 
squares calculation and calculate its distance from the original signal. 
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Figure 3.1.3: The number of iterations executed by ROMP as a function of the 
sparsity in dimension d = 10, 000 with 200 measurements. 



Figure 13.1.41 depicts the recovery error \\x — x\\2 when ROMP was run with per- 
turbed measurements. This plot was generated with d = 256 for various levels of 
sparsity s. The horizontal axis represents the number of measurements m, and the 
vertical axis represents the average normalized recovery error. Figure 13.1.41 confirms 
the results of Theorem I3.1.2[ while also suggesting the bound may be improved by 
removing the v^log s factor. 

Figure 13.1.51 depicts the normalized recovery error when the signal was perturbed 
by a Gaussian vector. The figure confirms the results of Corollary 13.1.31 while also 
suggesting again that the logarithmic factor in the corollary is unnecessary. 



3.1.6 Summary 

There are several critical properties that an ideal algorithm in compressed sensing 
should possess. One such property is stability, guaranteeing that under small pertur- 
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Figure 3.1.4: The error to noise ratio 
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ments m in dimension d = 256 for various levels of sparsity s. 



bations of the inputs, the algorithm still performs approximately correct. Secondly, 
the algorithm needs to provide uniform guarantees, meaning that with high proba- 
bility the algorithm works correctly for all inputs. Finally, to be ideal in practice, 
the algorithm would need to have a fast runtime. The £i-minimization approach to 
compressed sensing is stable and provides uniform guarantees, but since it relies on 
the use of Linear Programming, lacks a strong bound on its runtime. The greedy ap- 
proach is quite fast both in theory and in practice, but had lacked both stability and 
uniform guarantees. We analyzed the restricted isometry property in a unique way 
and found consequences that could be used in a greedy fashion. Our breakthrough 
algorithm ROMP is the first to provide all these benefits (stability, uniform guaran- 
tees, and speed), and essentially bridges the gap between the two major approaches 
in compressed sensing. 
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Normalized Recovery Error from ROMP on Perturbed Signals , d=256 
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Figure 3.1.5: The error to noise ratio 
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3.2 Compressive Sampling Matching Pursuit 



Regularized Orthogonal Matching Pursuit bridged a critical gap between the major 
approaches in compressed sensing. It provided the speed of the greedy approach and 
the strong guarantees of the convex optimization approach. Although its contribu- 
tions were significant, it still left room for improvement. The requirements imposed by 
ROMP on the restricted isometry condition were slightly stronger than those imposed 
by the convex optimization approach. This then in turn weakened the error bounds 
provided by ROMP in the case of noisy signals and measurements. These issues were 
resolved by our algorithm Compressive Sampling Matching Pursuit (CoSaMP). A 
similar algorithm, Subspace Matching Pursuit was also developed around this time, 
which provides similar benefits to those of CoSaMP. See [H] for details. 
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3.2.1 Description 

One of the key differences between ROMP and OMP is that at each iteration ROMP 
selects more than one coordinate to be in the support set. Because of this, ROMP 
is able to make mistakes in the support set, while still correctly reconstructing the 
original signal. This is accomplished because we bound the number of incorrect 
choices the algorithm can make. Once the algorithm chooses an incorrect coordinate, 
however, there is no way for it to be removed from the support set. An alternative 
approach would be to allow the algorithm to choose incorrectly as well as fix its 
mistakes in later iterations. In this case, at each iteration we select a slightly larger 
support set, reconstruct the signal using that support, and use that estimation to 
calculate the residual. 

Tropp and Needell developed a new variant of OMP, Compressive Sampling 
Matching Pursuit (CoSaMP) [531 E2]- This new algorithm has the same uniform 
guarantees as ROMP, but does not impose the logarithmic term for the Restricted 
Isometry Property or in the error bounds. Since the sampling operator $ satisfies 
the Restricted Isometry Property, every s entries of the signal proxy y = are 
close in the Euclidean norm to the s corresponding entries of the signal x. Thus as 
in ROMP, the algorithm first selects the largest coordinates of the signal proxy y 
and adds these coordinates to the running support set. Next however, the algorithm 
performs a least squares step to get an estimate b of the signal, and prunes the signal 
to make it sparse. The algorithm's major steps are described as follows: 

1. Identification. The algorithm forms a proxy of the residual from the current 
samples and locates the largest components of the proxy. 

2. Support Merger. The set of newly identified components is united with the 
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set of components that appear in the current approximation. 

3. Estimation. The algorithm solves a least-squares problem to approximate the 
target signal on the merged set of components. 

4. Pruning. The algorithm produces a new approximation by retaining only the 
largest entries in this least-squares signal approximation. 

5. Sample Update. Finally, the samples are updated so that they reflect the 
residual, the part of the signal that has not been approximated. 

These steps are repeated until the halting criterion is triggered. Initially, we concen- 
trate on methods that use a fixed number of iterations. Section 13.2.41 discusses some 
other simple stopping rules that may also be useful in practice. Using these ideas, 
the pseudo-code for CoSaMP can thus be described as follows. 
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Compressive Matching Pursuit (CoSaMP) [53J 

Input: Measurement matrix $, measurement vector u = $x, sparsity level s 

Output: s-sparse reconstructed vector x = a 

Procedure: 

Initialize Set = 0, v = u, k = 0. Repeat the following steps and increment k 
until the halting criterion is true. 

Signal Proxy Set y = $*t>, Q = supp|/2s and merge the supports: T = Q U 
suppa*^"^. 

Signal Estimation Using least-squares, set 6|r = ^q-u and b\T'^ = 0. 
Prune To obtain the next approximation, set = &«. 
Sample Update Update the current samples: v = u — $a^. 

There are a few major concepts of which the algorithm CoSaMP takes advan- 
tage. Unlike some other greedy algorithms, CoSaMP selects many components at 
each iteration. This idea can be found in theoretical work on greedy algorithms by 
Temlyakov as well as some early work of Gilbert, Muthukrishnan, Strauss and Tropp 
[51] , [nS] ■ It is also the key idea of recent work on the Fourier sampling algorithm [33] . 
The ROMP and StOMP algorithms also incorporate this notion [55], [23j . 

The application of the Restricted Isometry Property to compare the norms of 
vectors under the action of the sampling operator and its adjoint is also key in this 
algorithm and its analysis. The Restricted Isometry Property is due to Candes and 
Tao [n]. The application of the property to greedy algorithms is relatively new, and 
appears in [32] and [55] . 

Another key idea present in the algorithm is the pruning step to maintain sparsity 
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of the approximation. This also has significant ramifications in other parts of the 
analysis and the running time. Since the Restricted Isometry Property only holds for 
sparse vectors, it is vital in the analysis that the approximation remain sparse. This 
idea also appears in [32] . 

Our analysis focuses on mixed-norm error bounds. This idea appears in the work 
of Candes, Romberg, Tao [5] as well as [32] and [10] . In our analysis, we focus on the 
fact that if the error is large, the algorithm must make progress. This idea appears 
in work by Gilbert and Strauss, for example [32] . 

The Li-minimization method and the ROMP algorithm provide the strongest 
known guarantees of sparse recovery. These guarantees are uniform in that once 
the sampling operator satisfies the Restricted Isometry Property, both methods work 
correctly for all sparse signals. Li-minimization is based on linear programming, 
which provides only a polynomial runtime. Greedy algorithms such as OMP and 
ROMP on the other hand, are much faster both in theory and empirically. Our 
algorithm CoSaMP provides both uniform guarantees as well as fast runtime, while 
improving upon the error bounds and Restricted Isometry requirements of ROMP. 
We describe these results next as we state the main theorems. 

3.2.2 Main Theorem 

Next we state the main theorem which guarantees exact reconstruction of sparse 
signals and approximate reconstruction of arbitrary signals. The proof of the theorem 
is presented in Section 13.2.31 

Theorem 3.2.1 (CoSaMP [53]). Suppose that ^ is an m x d sampling matrix with 
restricted isometry constant 62s < c, as in (12.1.41) . Let u = + e be a vector 
of samples of an arbitrary signal, contaminated with arbitrary noise. For a given 



3.2. Compressive Sampling Matching Pursuit 75 

precision parameter rj, the algorithm CoSaMP produces an s-sparse approximation x 
that satisfies 

\\x — x\\2 < C ■ max |?7, ||x — a;s/2||-^ + ||e||2|' 

where Xs/2 is a best {s/ 2) -sparse approximation to x. The running time is 0(=Sf • 
log(||x||2 Z"?])), where ^ bounds the cost of a matrix-vector multiply with $ or $*. 
Working storage is 0{d). 

Remarks. 1. We note that as in the case of ROMP, CoSaMP requires knowledge 
of the sparsity level s. As described in Section 13.1.11 there are several strategies to 
estimate this level. 

2. In the hypotheses, a bound on the restricted isometry constant 62s also 
suffices. Indeed, Corollary 13.2.71 of the sequel implies that 64s < 0.1 holds whenever 
§28 < 0.025. 

3. Theorem 13.2.11 is a result of running CoSaMP using an iterative algorithm 
to solve the least-squares step. We analyze this step in detail below. In the case of 
exact arithmetic, we again analyze CoSaMP and provide an iteration count for this 
case: 

Theorem 3.2.2 (Iteration Count). Suppose that CoSaMP is implemented with exact 
arithmetic. After at most 6{s + 1) iterations, CoSaMP produces an s-sparse approx- 
imation X that satisfies 

\\x — a\\2 < 20z/, 

where v is the unrecoverable energy (13.2.11) . 

See Theorem 13.2.221 in Section 13.2.31 below for more details. 

The algorithm produces an s-sparse approximation whose £2 error is comparable 
with the scaled l\ error of the best (s/2)-sparse approximation to the signal. Of 
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course, the algorithm cannot resolve the uncertainty due to the additive noise, so we 
also pay for the energy in the noise. This type of error bound is structurally optimal, 
as discuss when describing the unrecoverable energy below. Some disparity in the 
sparsity levels (here, s versus s/2) seems to be necessary when the recovery algorithm 
is computationally efficient [58] . 

To prove our theorem, we show that CoSaMP makes significant progress during 
each iteration where the approximation error is large relative to unrecoverable energy 
V in the signal. This quantity measures the baseline error in our approximation that 
occurs because of noise in the samples or because the signal is not sparse. For our 
purposes, we define the unrecoverable energy by the following. 



1 

u — 1 1 3^ 3^ ^ 1 1 2 H~ ^ — 1 1 ^ ^ s 1 1 ]^ 11^112* ^3.2. Xj 



The expression (13.2.11) for the unrecoverable energy can be simplified using Lemma 7 
from [32], which states that, for every signal y G and every positive integer t, we 
have 

ll!/-»ll,<5^ll!/|l,. 

Choosing y = x — Xs/2 and t = s/2, we reach 



iy<^l\\x-x,/2\\, + \\e\\,. (3.2.2) 



In words, the unrecoverable energy is controlled by the scaled ii norm of the signal 
tail. 

The term "unrecoverable energy" is justified by several facts. First, we must pay 
for the £2 error contaminating the samples. To check this point, define S = suppa;<j. 
The matrix $5 is nearly an isometry from £f to i"^, so an error in the large components 
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of the signal induces an error of equivalent size in the samples. Clearly, we can never 
resolve this uncertainty. 

The term ' 11 also required on account of classical results about the 

Gel'fand widths of the £f ball in due to Kashin [12] and Garnaev-Gluskin [30] . 
In the language of compressive sampling, their work has the following interpretation. 
Let $ be a fixed mx d sampling matrix. Suppose that, for every signal x G C^, there 
is an algorithm that uses the samples u = $x to construct an approximation a that 
achieves the error bound 

C 

Then the number m of measurements must satisfy m > cslog{d/ s). 
3.2.3 Proofs of Theorems 

Theorem 13.2.11 will be shown by demonstrating that the following iteration invariant 
holds. These results can be found in [53] . 

Theorem 3.2.3 (Iteration Invariant). For each iteration k > 0, the signal approxi- 
mation is s-sparse and 

\\x - a^+^||2 < 0.5||x - a^^ll^ + lOu. 

In particular, 

llx-a'^ll^ < 2-'=||a;||2 + 20z/. 

We will first show this holds for sparse input signals, and then derive the general 
case. 

When the sampling matrix satisfies the restricted isometry inequalities (12.1.41) . it 
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has several other properties that we require repeatedly in the proof that the CoSaMP 
algorithm is correct. Our first observation is a simple translation of (12.1. 4p into other 
terms, in the same light as Proposition 13.1.61 used in the proof of ROMP. 

Proposition 3.2.4. Suppose $ has restricted isometry constant 6r- Let T be a set 

of r indices or fewer. Then 

lt*HI.<7=ii"ii2 

where the last two statements contain an upper and lower bound, depending on the 
sign chosen. 

Proof. The restricted isometry inequalities (12.1.41) imply that the singular values of 
$T he between a/1 — Sr and \/l + Sr- The bounds follow from standard relationships 
between the singular values of $t and the singular values of basic functions of $t- CH 

A second consequence is that disjoint sets of columns from the sampling matrix 
span nearly orthogonal subspaces. The following result quantifies this observation. 

Proposition 3.2.5 (Approximate Orthogonality). Suppose $ has restricted isometry 
constant 6r. Let S and T be disjoint sets of indices whose combined cardinality does 
not exceed r. Then 

||$s*r|| < Sr. 

Proof. Abbreviate R = S UT, and observe that is a submatrix of — Id. 
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The spectral norm of a submatrix never exceeds the norm of the entire matrix. We 
discern that 

< Wr'^r - Id|| < max{(l + 5^) - 1, 1 - (1 - 5r)} = 5r 

because the eigenvalues of $^$_r he between 1 — 6r and 1 + 5^. □ 

This result will be applied through the following corollary. 

Corollary 3.2.6. Suppose $ has restricted isometry constant 6r- Let T be a set of 
indices, and let x be a vector. Provided that r > |r U suppx|, 



2 • 



Proof. Define S = suppx \ T, so we have x\s = x\t'=. Thus, 

11$:^$ • x\tc\\2 = W^T^ ■ X\s\\2 < II'^'t'^sII IkUlla < \\x\tA\2 ' 

owing to Proposition I3.2.5[ □ 

As a second corollary, we show that S2r gives weak control over the higher re- 
stricted isometry constants. 

Corollary 3.2.7. Let c and r be positive integers. Then 6cr ^ c ■ 62r- 

Proof. The result is clearly true for c = 1,2, so we assume c > 3. Let S be an 
arbitrary index set of size or, and let M = $g$s — Id. It suffices to check that 
||M|| < c ■ 62r- To that end, we break the matrix M into r x r blocks, which we 
denote Mij. A block version of Gershgorin's theorem states that ||M|| satisfies at 
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|||M|| - ||Mii||| < > ||Mij|| where z = 1,2, ... ,c. 

The derivation is entirely analogous with the usual proof of Gershgorin's theorem, 
so we omit the details. For each diagonal block, we have ||Mii|| < br because of 
the restricted isometry inequalities f l2.1.4p . For each off-diagonal block, we have 
||Mij|| < b^r because of Proposition 13.2.51 Substitute these bounds into the block 
Gershgorin theorem and rearrange to complete the proof. □ 

Finally, we present a result that measures how much the sampling matrix inflates 
nonsparse vectors. This bound permits us to establish the major results for sparse 
signals and then transfer the conclusions to the general case. 

Proposition 3.2.8 (Energy Bound). Suppose that $ verifies the upper inequality of 
(12X^1) . VIZ. 

W^xW^ < a/1 + 5r ||a^|l2 when ||a;||o ^ 
Then, for every signal x, 

W^XW^ < \/l + 6r 

Proof. First, observe that the hypothesis of the proposition can be regarded as a 
statement about the operator norm of $ as a map between two Banach spaces. For 
a set / C {1, 2, ... , A^}, write for the unit ball in C.2{I)- Define the convex body 



+ 



\x\ 
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and notice that, by hypothesis, the operator norm 
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Define a second convex body 



K = '{x : \\x\\2 + \\x\\^ < 1 1^ C C^, 



and consider the operator norm 



x&K 



The content of the proposition is the claim that 



|$||k^2 < 11^115-2 • 



To establish this point, it suffices to check that K G S. 

Choose a vector x E K. We partition the support of x into sets of size r. Let Iq 
index the r largest-magnitude components of x, breaking ties lexicographically. Let 
1 1 index the next largest r components, and so forth. Note that the final block Ij 
may have fewer than r components. We may assume that x\i. is nonzero for each j. 

This partition induces a decomposition 



where 
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By construction, each vector yj belongs to 5* because it is r-sparse and has unit £2 
norm. We will prove that Xj < I, which implies that x can be written as a convex 
combination of vectors from the set S. As a consequence, x G S*. It emerges that 
KcS. 

Fix j in the range {1, 2, . . . , J}. It follows that Ij contains at most r elements 
and contains exactly r elements. Therefore, 

Xj = \\x\i^\\^ <V^\\x\i^\\^ ^^'l ll^l Villi 

where the last inequality holds because the magnitude of x on the set Ij-i dominates 
its largest entry in Ij. Summing these relations, we obtain 




It is clear that Aq = ||a;|/oll2 — 11^112- conclude that 

EJ ^ „ „ 1 „ „ 
Xj < kc L H — \\x\\, < 1 
j=o ^ - ^ "1 - 

because x E K. □ 
Iteration Invariant: Sparse Case 

We now commence the proof of Theorem 13.2.31 For the moment, let us assume that 
the signal is actually sparse. We will remove this assumption later. 

The result states that each iteration of the algorithm reduces the approximation 
error by a constant factor, while adding a small multiple of the noise. As a con- 
sequence, when the approximation error is large in comparison with the noise, the 
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algorithm makes substantial progress in identifying the unknown signal. 

Theorem 3.2.9 (Iteration Invariant: Sparse Case). Assume that x is s-sparse 
each k >0, the signal approximation a*^ is s-sparse, and 

\\x - a'^^'^W^ < 0.5\\x - a''\\^ + 7.5 ||e||2 . 

In particular, 

\\x - a^||2 < 2"'' ||a;||2 + 15 ||e||2 • 

The argument proceeds in a sequence of short lemmas, each corresponding to one 
step in the algorithm. Throughout this section, we retain the assumption that x is 
s-sparse. 

Fix an iteration k > 1. We write a = a^^^ for the signal approximation at the 
beginning of the iteration. Define the residual r — x — a, which we interpret as the 
part of the signal we have not yet recovered. Since the approximation a is always s- 
sparse, the residual r must be 2s-sparse. Notice that the vector v of updated samples 
can be viewed as noisy samples of the residual: 

V u — — ^{x — a) + e — ^r + e. 

The identification phase produces a set of components where the residual signal 
still has a lot of energy. 

Lemma 3.2.10 (Identification). The set fl — suppy2s; where y — is the signal 
proxy, contains at most 2s indices, and 

||r|ac||2 < 0.2223 ||r||2 + 2.34 ||e||2. 
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. For 
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Proof. The identification phase forms a proxy y = $*f for the residual signal. The 
algorithm then selects a set Q of 2s components from y that have the largest magni- 
tudes. The goal of the proof is to show that the energy in the residual on the set Q'^ 
is small in comparison with the total energy in the residual. 

Define the set R = suppr. Since R contains at most 2s elements, our choice of Q 
ensures that ||y|K||2 < ||ybll2- By squaring this inequality and canceling the terms 
in RnQ, we discover that 

Since the coordinate subsets here contain few elements, we can use the restricted 
isometry constants to provide bounds on both sides. 

First, observe that the set Q\R contains at most 2s elements. Therefore, we may 
apply Proposition 13.2.41 and Corollary 13.2.61 to obtain 

\\y\n\R\\^ = \\^n\B{^r + e)\\^ 

< \\^h\R^42+\\^kR42 

< Sis \\r\\2 + a/1 + S2s ||e||2 • 

Likewise, the set R\Q contains 2s elements or fewer, so Proposition 13 .2 .41 and Corol- 
lary 13.2.61 yield 

||l/|i?\n||2 = \\^*R\ni^r + e) 

> \\^R\n^ ■ r\n\n 

> (1 - 52s)|HiJ\n 

Since the residual is supported on R, we can rewrite r|/j\f7 = r\nc. Finally, combine 



- §28 \\r\\2 - a/1 + ||e||2 • 
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the last three inequalities and rearrange to obtain 




{S2s + S4s) \\r\\2 + 2V1 + 62s \\e 

1-52S 



Invoke the numerical hypothesis that 52s < < 0.1 to complete the argument. □ 

The next step of the algorithm merges the support of the current signal approx- 
imation a with the newly identified set of components. The following result shows 
that components of the signal x outside this set have very little energy. 

Lemma 3.2.11 (Support Merger). Let fl be a set of at most 2s indices. The set 
T — flu suppa contains at most 3s indices, and 



The estimation step of the algorithm solves a least-squares problem to obtain 
values for the coefficients in the set T. We need a bound on the error of this approx- 
imation. 

Lemma 3.2.12 (Estimation). Let T be a set of at most 3s indices, and define the 
least-squares signal estimate b by the formulae 



^ I 1 1 2 — I r I fJ'^ 1 1 2 ■ 



Proof. Since suppa C T, we find that 




where the inequality follows from the containment T'^ C fl'^. 



□ 



6|t = 



and b\Tc = 0, 
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where u = (^x + e. Then 
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\\x-h\\^ < 1.112 ||a;|Tc||2 + 1.06 ||e||2. 

This result assumes that we solve the least-squares problem in infinite precision. 
In practice, the right-hand side of the bound contains an extra term owing to the 
error from the iterative least-squares solver. Below, we study how many iterations of 
the least-squares solver are required to make the least-squares error negligible in the 
present argument. 

Proof. Note first that 

\\x — &II2 < ||3;|r<:||2 + II^^It — &|t||2 • 
Using the expression u = $x + e and the fact ^^^^ = Idy, we calculate that 

||a;|T - &|t||2 = II^^It - ■ x\t + ^ ■ x\t'^ + e)\\^ 

= ■ x\tc + e)\\^ 

The cardinality of T is at most 3s, and x is s-sparse, so Proposition 13.2.41 and Corol- 
lary 13.2.61 imply that 

-L - Vl - (>3s 

5ao llelL 
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Combine the bounds to reach 
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||a;-6||2< 1 + \\x\t42 + ^=^- 

Finally, invoke the hypothesis that < < 0.1. □ 

The final step of each iteration is to prune the intermediate approximation to its 
largest s terms. The following lemma provides a bound on the error in the pruned 
approximation. 

Lemma 3.2.13 (Pruning). The pruned approximation hs satisfies 

\\x — hsW^ < 2 ||x — 6II2 . 
Proof. The intuition is that hg is close to 6, which is close to x. Rigorously, 

— &s|l2 ^ Ik ~ ^Il2 + 11^ ~ ^sll2 ^ 2 ||x — 6II2 . 

The second inequality holds because hs is the best s-sparse approximation to h. In 
particular, the s-sparse vector x is a worse approximation. □ 

We now complete the proof of the iteration invariant for sparse signals. Theo- 
rem I3.2.9[ At the end of an iteration, the algorithm forms a new approximation 
= bg, which is evidently s-sparse. Applying the lemmas we have established, we 
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easily bound the error: 
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1 1 X CL 1 1 2 — 1 1 ^ s 1 1 2 

< 2 ||a; — 6II2 Pruning (Lemma l3.2.13p 

< 2 ■ (1.112 ||a;|T-||2 + 1-06 Ijeiy Estimation (Lemma [SXID 

< 2.224 ||r|nc||2 + 2.12 ||e||2 Support merger (Lemma 13.2. lip 

< 2.224 ■ (0.2223 ||r||2 + 2.34 Heiy + 2.12 ||e||2 Identification (Lemma [HXIO]) 

< 0.5 ||r||2 + 7.5 ||e||2 

= 0.5||x - a^'^W^ + 7.5 ||e||2 . 

To obtain the second bound in Theorem I3.2.9[ simply solve the error recursion and 
note that 

(1 + 0.5 + 0.25 + ...)■ 7.5 ||e||2 < 15 ||e||2 . 

This point completes the argument. 

Before extending the iteration invariant to the sparse case, we first analyze in 
detail the least-sqaures step. This will allow us to completely prove our main result, 
Theorem 13.2.11 

Least Squares Analysis 

To develop an efficient implementation of CoSaMP, it is critical to use an iterative 
method when we solve the least-squares problem in the estimation step. Here we 
analyze this step for the noise-free case. The two natural choices are Richardson's 
iteration and conjugate gradient. The efficacy of these methods rests on the assump- 
tion that the sampling operator has small restricted isometry constants. Indeed, since 
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the set T constructed in the support merger step contains at most 3s components, 
the hypothesis < 0.1 ensures that the condition number 

This condition number is closely connected with the performance of Richardson's 
iteration and conjugate gradient. In this section, we show that Theorem 13.2.91 holds 
if we perform a constant number of iterations of either least-squares algorithm. 

For completeness, let us explain how Richardson's iteration can be applied to 
solve the least-squares problems that arise in CoSaMP. Suppose we wish to compute 
A^^u where A is a tall, full-rank matrix. Recalling the definition of the pseudoinverse, 
we realize that this amounts to solving a linear system of the form 

(A*A)b = A*u. 

This problem can be approached by splitting the Gram matrix: 

A* A = Id+M 

where M = A* A — Id. Given an initial iterate z^, Richardon's method produces the 
subsequent iterates via the formula 

= A*u-Mz^ 

Evidently, this iteration requires only matrix-vector multiplies with A and A*. It 
is worth noting that Richardson's method can be accelerated [2, Sec. 7.2.5], but we 
omit the details. 
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It is quite easy to analyze Richardson's iteration [21 Sec. 7.2.1]. Observe that 



- A^u\\^ = ||M(z^ - A^u)\\^ < ||M|| ||/ - A+ujl^. 

This recursion delivers 

||/ - A+ujl^ < ||Mf 11^° - A^u\\^ for £ = 0, 1, 2, ... . 

In words, the iteration converges linearly. 

In our setting, A = $t where T is a set of at most 3s indices. Therefore, the 
restricted isometry inequalities fl2.1.4p imply that 

||M|| = ||$^$T-Id|| < 53s. 

We have assumed that 63s < < 0.1, which means that the iteration converges 
quite fast. Once again, the restricted isometry behavior of the sampling matrix plays 
an essential role in the performance of the CoSaMP algorithm. 

Conjugate gradient provides even better guarantees for solving the least-squares 
problem, but it is somewhat more complicated to describe and rather more difficult 
to analyze. We refer the reader to [2], Sec. 7.4] for more information. The follow- 
ing lemma summarizes the behavior of both Richardson's iteration and conjugate 
gradient in our setting. 

Lemma 3.2.14 (Error Bound for LS). Richardson's iteration produces a sequence 
{z^} of iterates that satisfy 

||/ - "^^uW^ < 0.1^ ■ \\z° - for i = 0,1,2,.... 
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Conjugate gradient produces a sequence of iterates that satisfy 
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<l>T.n|| <2- p' -llz^ - for £ = 0,1, 2, 



where 



This can even be improved further if the eigenvalues of are clustered [M] . 

Iterative least-squares algorithms must be seeded with an initial iterate, and their 
performance depends heavily on a wise selection thereof. CoSaMP offers a natural 
choice for the initializer: the current signal approximation. As the algorithm pro- 
gresses, the current signal approximation provides an increasingly good starting point 
for solving the least-squares problem. The following shows that the error in the initial 
iterate is controlled by the current approximation error. 

Lemma 3.2.15 (Initial Iterate for LS). Letx he an s-sparse signal with noisy samples 
u = $x + e. Let a^~^ he the signal approximation at the end of the {k — l)th iteration, 
and let T he the set of components identified hy the support merger. Then 



\a''-^ - ^l^u\\^ < 2.112||a; - a^'\ + 1.06 ||e||2 



Proof. By construction of T, the approximation ^ is supported inside T, so 



Using Lemma 13.2.121 we may calculate how far a'^ ^ lies from the solution to the 
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least-squares problem. 
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_ < \\x - a'^'^W^ + \\x - ^T^\\2 

< \\x - a''-^\\^ + 1.112||a;|Tc||2 + 1.06 ||e||2 

< 2.112||x - a''~^\^ + 1.06 ||e||2. 

□ 

We need to determine how many iterations of the least-squares algorithm are 
required to ensure that the approximation produced is sufficiently good to support 
the performance of CoSaMP. 

Corollary 3.2.16 (Estimation by Iterative LS). Suppose that we initialize the LS 
algorithm with z^ = a'^^^ . After at most three iterations, both Richardson's iteration 
and conjugate gradient produce a signal estimate b that satisfies 

\\x - bL < 1.112||x|t-|L + 0.0022||x - a^'^^IL + 1.062 lleL . 

M W z M'Mz II llz iiiiz 

Proof. Combine Lemma 13.2.141 and Lemma 13.2.151 to see that three iterations of 
Richardson's method yield 

\\z^ - <I>J.m||2 < 0.002112||a; - a^'^\\^ + 0.00106 ||e||2 . 

The bound for conjugate gradient is slightly better. Let 6|t = z^. According to the 
estimation result, Lemma [3.2.121 we have 

||x - ^tu\\^ < 1.112||x|t-||2 + ^-^^ 11^112 • 
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An application of the triangle inequality completes the argument. □ 

Finally, we need to check that the sparse iteration invariant, Theorem 13.2.91 still 
holds when we use an iterative least-squares algorithm. 

Theorem 3.2.17 (Sparse Iteration Invariant with Iterative LS). Suppose that we use 
Richardson's iteration or conjugate gradient for the estimation step, initializing the 
LS algorithm with the current approximation a'^"^ and performing three LS iterations. 
Then Theorem \3.2.9\ still holds. 

Proof. We repeat the calculation from the above case using Corollary 13.2.161 instead 
of the simple estimation lemma. To that end, recall that the residual r = x — a^~^. 
Then 

\\x - a^W^ < 2 ||x - 6II2 

< 2 • (1.112 \\x\t42 + 0-0022 ||r||2 + 1.062 \\e\\^) 

< 2.224 ||r|f^c||2 + 0.0044 ||r||2 + 2.124 ||e||2 

< 2.224 ■ (0.2223 ||r||2 + 2.34 \\e\\^) + 0.0044 ||r||2 + 2.124 ||e||2 

< 0.5 ||r||2 + 7.5 ||e||2 

= 0.5||2; - a^'^W^ + 7.5 ||e||2 . 

This bound is precisely what is required for the theorem to hold. □ 

We are now prepared to extend the iteration invariant to the general case, and 
prove Theorem 13. 2. 1[ 



3.2. Compressive Sampling Matching Pursuit 



94 



Extension to General Signals 

In this section, we finally complete the proof of the main result for CoSaMP, Theo- 
rem [3231 The remaining challenge is to remove the hypothesis that the target signal 
is sparse, which we framed in Theorems 13.2.91 and 13.2.171 Although this difficulty 
might seem large, the solution is simple and elegant. It turns out that we can view 
the noisy samples of a general signal as samples of a sparse signal contaminated with 
a different noise vector that implicitly reflects the tail of the original signal. 

Lemma 3.2.18 (Reduction to Sparse Case). Let x he an arbitrary vector in C^. 
The sample vector u = $x + e can also be expressed as u = + e where 



|e||2 < 1.05 



1 n 

rf I I rf rp I 

I ^ 1 1 2 / — I I ^ \ \\ 



+ I|e|l2 



Proof. Decompose x = Xs + {x — Xs) to obtain u = $Xs + e where e = ^{x — Xg) + e. 
To compute the size of the error term, we simply apply the triangle inequality and 
Proposition I3.2.8t 



1 n 

I iXj 5 1 1 2 ' J — 1 1 s \ \ ^ 

V"5 



+ e 



Finally, invoke the fact that 5s < S^s ^ 0.1 to obtain a/1 + 5s < 1.05. □ 

This lemma is just the tool we require to complete the proof of Theorem 13. 2. 3[ 

Proof of Theorem \3.2.3[ Let a; be a general signal, and use Lemma 13.2.181 to write 
the noisy vector of samples u = ^Xs + e. Apply the sparse iteration invariant. 
Theorem 13.2.91 or the analog for iterative least-squares. Theorem 13.2.171 We obtain 

\\xs~a''+^L<0.5\\xs-a''L + 7.5\\e 
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Invoke the lower and upper triangle inequalities to obtain 
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\x - a < 0.5||s - a^ll^ + 7.5 ||e||2 + 1.5 ||a; - XsW^ 



Finally, recall the estimate for ||e||2 from Lemma [3.2.181 and simplify to reach 

„ 7.875 



-,k+l\ 



1 1 ~|~ 7*5 1 1 6 1 



where v is the unrecoverable energy (13.2.11) . □ 

We have now collected all the material we need to establish the main result. Fix 
a precision parameter 77. After at most 0(log(||x||2 /fi)) iterations, CoSaMP produces 
an s-sparse approximation a that satisfies 



a||2 < C ■ (?7 + z/) 



in consequence of Theorem 13 . 2 . 31 Apply inequality (13.2.21) to bound the unrecoverable 
energy v in terms of the ^l norm. We see that the approximation error satisfies 

\\x — a\\2 < C ■ max |?7, —j= ||x — a;s/2||-,^ + ||e||2 . 

According to Theorem 13. 2. 23[ each iteration of CoSaMP is completed in time 0(=Sf), 
where ^ bounds the cost of a matrix-vector multiplication with $ or $*. The total 
runtime, therefore, is 0(^log(||a;||2 /??))• The total storage is O(A^). 

Finally, in the statement of the theorem, we replace with by means of 
Corollary 13.2.71 which states that 5cr < c - for any positive integers c and r. 
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Iteration Count for Exact Arithmetic 

As promised, we now provide an iteration count for CoSaMP in the case of exact 
arithmetic. These results can be found in [52] . 

We obtain an estimate on the number of iterations of the CoSaMP algorithm 
necessary to identify the recoverable energy in a sparse signal, assuming exact arith- 
metic. Except where stated explicitly, we assume that x is s-sparse. It turns out that 
the number of iterations depends heavily on the signal structure. Let us explain the 
intuition behind this fact. 

When the entries in the signal decay rapidly, the algorithm must identify and re- 
move the largest remaining entry from the residual before it can make further progress 
on the smaller entries. Indeed, the large component in the residual contaminates each 
component of the signal proxy. In this case, the algorithm may require an iteration 
or more to find each component in the signal. 

On the other hand, when the s entries of the signal are comparable, the algorithm 
can simultaneously locate many entries just by reducing the norm of the residual 
below the magnitude of the smallest entry. Since the largest entry of the signal has 
magnitude at least times the £2 norm of the signal, the algorithm can find all s 
components of the signal after about log s iterations. 

To quantify these intuitions, we want to collect the components of the signal into 
groups that are comparable with each other. To that end, define the component bands 
of a signal x by the formulae 

Bj = {t : 2~(^'+^) ||x||2 < \xif < 2-^ \\x\\l} for j = 0,1,2,.... (3.2.3) 
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The profile of the signal is the number of bands that are nonempty: 
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profile(x) = #{j : Bj ^ 0}. 

In words, the profile counts how many orders of magnitude at which the signal has 
coefficients. It is clear that the profile of an s-sparse signal is at most s. 
See Figure 13.2.11 for images of stylized signals with different profiles. 



lit?.... 



Figure 3.2.1: Illustration of two unit-norm signals with sharply different profiles (left: 
low, right: high). 

First, we prove a result on the number of iterations needed to acquire an s-sparse 
signal. At the end of the section, we extend this result to general signals, which yields 
Theorem 13.2.21 

Theorem 3.2.19 (Iteration Count: Sparse Case). Let x be an s-sparse signal, and 
define p = profile(x). After at most 



plog4/3(l + 4-6v/s7p) + 6 
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iterations, CoSaMP produces an approximation a that satisfies 
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||x — a||2 < 17 ||e||2 . 

For a fixed s, the bound on the number of iterations achieves its maximum value 
at p = s. Since log4/3 ^-6 < 6, the number of iterations never exceeds 6(s + 1). 

Let us instate some notation that will be valuable in the proof of the theorem. 
We write p — profile(x). For each /c = 0, 1, 2, . . . , the signal al^ is the approximation 
after the fcth iteration. We abbreviate Sk = suppa'^, and we define the residual signal 

= X — o}^. The norm of the residual can be viewed as the approximation error. 

For a nonnegative integer j, we may define an auxiliary signal 

^ def I 

In other words, is the part of x contained in the bands Bj, Bj+i, Bj^2, For 

each j e J, we have the estimate 

lbl2<E.>,2-NkllM^^I (3-2.4) 

by definition of the bands. These auxiliary signals play a key role in the analysis. 

The proof of the theorem involves a sequence of lemmas. The first object is 
to establish an alternative that holds in each iteration. One possibility is that the 
approximation error is small, which means that the algorithm is effectively finished. 
Otherwise, the approximation error is dominated by the energy in the unidentified 
part of the signal, and the subsequent approximation error is a constant factor smaller. 

Lemma 3.2.20. For each iteration k = 0, 1, 2, ... , at least one of the following 
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alternatives holds. Either 

llr'^lla < 70 ||e||2 (3.2.5) 

or else 

llr^ll^ < 2.3||x|5c||^ and (3.2.6) 

||r^+^||2 < 0.75||r^||2. (3.2.7) 

Proof. Define as the merged support that occurs during iteration k. The pruning 
step ensures that the support Sk of the approximation at the end of the iteration is 
a subset of the merged support, so 

||a;|T^-||2 < Ikl^^lls for = 1,2,3, ... . 

At the end of the kth iteration, the pruned vector bg becomes the next approximation 
a'', so the estimation and pruning resuhs. Lemmas 13.2.121 and 13.2.131 together imply 
that 

llr'^IL < 2 • (1.112||x|t-|I + 1.06 lleL) 

< 2.224||a;|5=||2 + 2.12 ||e||2 for = 1, 2, 3, . . . . (3.2.8) 

Note that the same relation holds trivially for iteration k = because r° = x and 
So = 0. 

Suppose that there is an iteration k > where 

||a;|5dl2<30||e||2. 
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We can introduce this bound directly into the inequality (I3.2.8P to obtain the first 
conclusion fl3.2.5p . 

Suppose on the contrary that in iteration k we have 

lla^ls^lls >30||e||2. 

Introducing this relation into the inequality f l3.2.8p leads quickly to the conclusion 
fl3.2.6p . We also have the chain of relations 

Ik'^ll^ > V\s£ = \\ix- a'^)\sc^l = \\x\s^X ^ 30 \\e\\, . 

Therefore, the sparse iteration invariant, Theorem 13.2.91 ensures that fl3.2.7p holds. 

□ 

The next lemma contains the critical part of the argument. Under the second 
alternative in the previous lemma, we show that the algorithm completely identifies 
the support of the signal, and we bound the number of iterations required to do so. 

Lemma 3.2.21. Fix K = [plog4/3(l + 4.6v/s7p)J ■ Assume that ^i:^ and (13X7]) 

are in force for each iteration k = 0,1,2, K . Then suppa^^ = suppx. 

Proof. First, we check that, once the norm of the residual is smaller than each element 
of a band, the components in that band persist in the support of each subsequent 
approximation. Define J to be the set of nonempty bands, and fix a band j G J. 
Suppose that, for some iteration k, the norm of the residual satisfies 

||^fc||^<2-0-+i)/2||a;||^. (3.2.9) 
Then it must be the case that Bj C suppa'^. If not, then some component i G Bj 
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appears in the residual: rf = Xi. This supposition implies that 
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Ir'^IL > \xA > 2-(j'+i)/2 ii^i 



2 ' 



an evident contradiction. Since (13.2.71) guarantees that the norm of the residual 
declines in each iteration, (13.2.91) ensures that the support of each subsequent ap- 
proximation contains Bj. 

Next, we bound the number of iterations required to find the next nonempty 
band Bj, given that we have already identified the bands Bi where i < j. Formally, 
assume that the support Sk of the current approximation contains B^ for each i < j. 
In particular, the set of missing components 5*^ C suppy-'. It follows from relation 
fl3X6ll that 

llr'^ll <2.3||y^1l . 



We can conclude that we have identified the band Bj in iteration k + i ii 



|^fc+,||^^2-0-+l)/2||a;||2. 



According to (13.2.71) . we reduce the error by a factor of /3 ^ = 0.75 during each 
iteration. Therefore, the number i of iterations required to identify Bj is at most 



log 



2.3 y\ 



2-(j+l)/2 



We discover that the total number of iterations required to identify all the (nonempty) 
bands is at most 



2.3- 



\x\ 
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For each iteration k > \_k^\ , it follows that suppa^ = suppx. 

It remains to bound fc^ in terms of the profile p of the signal. For convenience, we 
focus on a slightly different quantity. First, observe that p = \J\. Using the geometric 
mean-arithmetic mean inequality, we discover that 



exp 



p ^jeJ 



log 



2.3 ■ 



20-+i)/2||y^-||, 



\x\ 



< exp ^^log ( 1 + 2.3- 

<-y (1 + 2.3- 



2(^-+i)/2||t/^-||2 



\x\ 



2a + l)/2 \\yi 



1 + 

P ^3 



1/2 



6J 



\X\ 



To bound the remaining sum, we recall the relation fl3.2.4p . Then we invoke Jensen's 
inequality and simplify the result. 



p 



i6J 



j ~ p ^ifiJ V ^i>3 '7 ~ \p ^ie-' 

<,(\y \B-\y 2^'"*+^V^% T-'V |5|V^^ 



l^flfp. 



The final equality holds because the total number of elements in all the bands equals 
the signal sparsity s. Combining these bounds, we reach 



exp < - lo£ 



2.3- 



2(i+i)/2 



Ir II2 



\x\ 



< l + 4:.6^/sJp. 



Take logarithms, multiply by p, and divide through by log (3. We conclude that the 
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h < plog^(l + 4.6a/s7p). 

This is the advertised conclusion. □ 

Finally, we check that the algorithm produces a small approximation error within 
a reasonable number of iterations. 

Proof of TheoremlMIM Abbreviate = \j) \og{l+ A. 6 ^/s/p)\. Suppose that (EXSl) 
never holds during the first K iterations of the algorithm. Under this circumstance, 
Lemma 13.2.201 implies that both (13.2.61) and (13.2.71) hold during each of these K iter- 
ations. It follows from Lemma [3.2.211 that the support Sk of the Kth approximation 
equals the support of x. Since Sk is contained in the merged support T^, we see that 
the vector = 0. Therefore, the estimation and pruning results. Lemmas 13.2.121 
and 13.2.131 show that 

||r^||2 < 2 ■ (^1.112 \\x\t-J\^ + 1.06 ||e||2) = 2.12 ||e||2 . 

This estimate contradicts (13.2.51) . 

It follows that there is an iteration k < K where (13.2.51) is in force. Repeated 
applications of the iteration invariant. Theorem I3.2.9[ allow us to conclude that 

||r^+^||2 < 17||e||2. 



This point completes the argument. 

Finally, we extend the sparse iteration count result to the general case. 



□ 
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Theorem 3.2.22 (Iteration Count). Let x be an arbitrary signal, and define p = 
profile(xs). After at most 



plog4/3(l + 4-6Vs/p) + 6 
iterations, CoSaMP produces an approximation a that satisfies 



\x — a||2 < 20 ||e||2 



Proof. Let x be a general signal, and let p = profile(xs). Lemma 13.2.181 allows us 
to write the noisy vector of samples u = ^Xg + e. The sparse iteration count result, 
Theorem 13.2.191 states that after at most 



plog4/3(l + 4-6Vs/p) + 6 
iterations, the algorithm produces an approximation a that satisfies 



\xs — a||2 ^ 17 ||e||2 . 



Apply the lower triangle inequality to the left-hand side. Then recall the estimate 
for the noise in Lemma 13.2. 18[ and simplify to reach 

1 1 ^ 1 1 2 17 11^112 I 1 1 ^ ^ s 1 1 2 



17.9 „ „ „ , 

< 18.9 \\x — Xs\\2 H ^ If — a^slli + 17 ||e| 



< 20z/, 



where u is the unrecoverable energy. 



□ 
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Proof of Theorem \3.2.2[ Invoke Theorem I3.2.22[ Recall that the estimate for the 
number of iterations is maximized with p = s, which gives an upper bound of 6(s + 1) 
iterations, independent of the signal. □ 

3.2.4 Implementation and Runtime 

CoSaMP was designed to be a practical method for signal recovery. An efficient 
implementation of the algorithm requires some ideas from numerical linear algebra, 
as well as some basic techniques from the theory of algorithms. This section discusses 
the key issues and develops an analysis of the running time for the two most common 
scenarios. 

We focus on the least-squares problem in the estimation step because it is the 
major obstacle to a fast implementation of the algorithm. The algorithm guarantees 
that the matrix $t never has more than 3s columns, so our assumption 64s < 0.1 
implies that the matrix $t is extremely well conditioned. As a result, we can apply 
the pseudoinverse = ($^$2^)^^$^ very quickly using an iterative method, such 
as Richardson's iteration ^ Sec. 7.2.3] or conjugate gradient [21 Sec. 7.4]. These 
techniques have the additional advantage that they only interact with the matrix $7- 
through its action on vectors. It follows that the algorithm performs better when the 
sampling matrix has a fast matrix-vector multiply. 

Let us stress that direct methods for least squares are likely to be extremely inef- 
ficient in this setting. The first reason is that each least-squares problem may contain 
substantially different sets of columns from $. As a result, it becomes necessary to 
perform a completely new QR or SVD factorization during each iteration at a cost 
of O(s^m). The second problem is that computing these factorizations typically re- 
quires direct access to the columns of the matrix, which is problematic when the 
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matrix is accessed through its action on vectors. Third, direct methods have storage 
costs 0(sm), which may be deadly for large-scale problems. 

The remaining steps of the algorithm involve standard techniques. Let us estimate 
the operation counts. 

Proxy Forming the proxy is dominated by the cost of the matrix- vector multiply 

Identification We can locate the largest 2s entries of a vector in time O(A^) using 
the approach in [TTl Ch. 9]. In practice, it may be faster to sort the entries of 
the signal in decreasing order of magnitude at cost O(A^logA^) and then select 
the first 2s of them. The latter procedure can be accomplished with quicksort, 
mergesort, or heapsort pTl Sec. II]. To implement the algorithm to the letter, 
the sorting method needs to be stable because we stipulate that ties are broken 
lexicographically. This point is not important in practice. 

Support Merger We can merge two sets of size 0(s) in expected time 0(s) using 
randomized hashing methods pTl Ch. 11]. One can also sort both sets first and 
use the elementary merge procedure |Tl], p. 29] for a total cost O(slogs). 

LS estimation We use Richardson's iteration or conjugate gradient to compute 
Initializing the least-squares algorithm requires a matrix-vector multiply 
with Each iteration of the least-squares method requires one matrix-vector 
multiply each with $r and Since is a submatrix of $, the matrix-vector 
multiplies can also be obtained from multiplication with the full matrix. We 
proved above that a constant number of least-squares iterations suffices for 
Theorem [323] to hold. 

Pruning This step is similar to identification. Pruning can be implemented in time 
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0(s), but it may be preferable to sort the components of the vector by magni- 
tude and then select the first s at a cost of 0(s logs). 

Sample Update This step is dominated by the cost of the multiplication of $ with 

the s-sparse vector a'^. 

Table 13.11 summarizes this discussion in two particular cases. The first column 
shows what happens when the sampling matrix $ is applied to vectors in the standard 
way, but we have random access to submatrices. The second column shows what 
happens when the sampling matrix $ and its adjoint $* both have a fast multiply 
with cost where we assume that ^ > N. A typical value is =Sf = O(A^logA^). In 
particular, a partial Fourier matrix satisfies this bound. 

Table 3.1: Operation count for CoSaMP. Big-0 notation is omitted for legibility. The 
dimensions of the sampling matrix $ are m x N; the sparsity level is s. The number 
bounds the cost of a matrix-vector multiply with $ or $*. 



Step 


Standard multiply 


Fast multiply 


Form proxy 


mN 




Identification 


N 


N 


Support merger 


s 


s 


LS estimation 


sm 




Pruning 


s 


s 


Sample update 


sm 




Total per iteration 


0{mN) 





Finally, we note that the storage requirements of the algorithm are also favorable. 
Aside from the storage required by the sampling matrix, the algorithm constructs 
only one vector of length A^, the signal proxy. The sample vectors u and v have 
length m, so they require 0(m) storage. The signal approximations can be stored 
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using sparse data structures, so they require at most O(slogA^) storage. Similarly, 
the index sets that appear require only O(slogA^) storage. The total storage is at 
worst O(A^). 

The following result summarizes this discussion. 

Theorem 3.2.23 (Resource Requirements). Each iteration of CoSaMP requires 0(=Sf ) 
time, where =Sf bounds the cost of a multiplication with the matrix $ or $* . The al- 
gorithm uses storage 0{N). 

Algorithmic Variations 

This section describes other possible halting criteria and their consequences. It also 
proposes some other variations on the algorithm. 

There are three natural approaches to halting the algorithm. The first, which we 
have discussed in the body of the paper, is to stop after a fixed number of iterations. 
Another possibility is to use the norm ||f of the current samples as evidence about 
the norm ||r||2 of the residual. A third possibility is to use the magnitude \\y\\^ of 
the entries of the proxy to bound the magnitude of the entries of the residual. 

It suffices to discuss halting criteria for sparse signals because Lemma [3.2.181 shows 
that the general case can be viewed in terms of sampling a sparse signal. Let x be 
an s-sparse signal, and let a be an s-sparse approximation. The residual r = x — a. 
We write f = $r + e for the induced noisy samples of the residual and y = $*f for 
the signal proxy. 

The discussion proceeds in two steps. First, we argue that an a priori halting 
criterion will result in a guarantee about the quality of the final signal approximation. 
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Theorem 3.2.24 (Hahing I). The halting criterion \\v\\2 < £ ensures that 



\x — a||2 < 1-06 ■ {e + WeW^) 



The halting criterion \\y\\^ < r]/\^ ensures that 

||a;-a||^ < 1.127] + 1.17 ||e||2 . 
Proof. Since r is 2s-sparse, Proposition 13.2.41 ensures that 



^/l- 62s \\r\\2 - ||e||2 < ||'y||2 • 



If 11^112 ^ ^! it is immediate that 



I II / ^ + ^ 2 
I' II2 



The definition r = x — a and the numerical bound 62s < < 0.1 dispatch the first 
claim. 

Let R = suppr, and note that \R\ < 2s. Proposition 13.2.41 results in 



^1 - S2s) \\r\\2 - V 1 + '^2s ||e||2 < \\y\ 



R\\2 ■ 



Since 

WvIrL < IbUJIoo < \\y\\oo ' 



we find that the requirement \\y\\^ < rj/V^s ensures that 



I II ^ V+ VI + ||e||2 
r < — 



1-5. 



2s 
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The numerical bound 62s < 0.1 completes the proof. □ 

Second, we check that each halting criterion is triggered when the residual has 
the desired property. 

Theorem 3.2.25 (Halting II). The halting criterion ||f < e is triggered as soon as 

\\x — a\\2 < 0.95 ■ {e — ||e||2). 

The halting criterion \\y\\^ < r]/^/2s is triggered as soon as 

0.45r/ O.eSllelL 
ll--a|L<^ 

Proof. Proposition 13.2.41 shows that 

\\v\\2 < + ^2s \\r\\2 + ||e||2 • 



Therefore, the condition 



^ — e L 

I' II2 ^ 



VT+S2S 

ensures that ||f II2 < e. Note that < 0.1 to complete the first part of the argument. 

Now let R be the singleton containing the index of a largest-magnitude coefficient 
of y. Proposition 13.2.41 implies that 

\\y\L = \\y\R\\2<V^^i\H2- 

By the first part of this theorem, the halting criterion \\y\\^ < ri/\/2s is triggered as 
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soon as 

||x-a|L < 0.95- ( , ^ llelL 1 . 

Since a; — a is 2s-sparse, we have the bound \\x — aW^ < \\x — cl\\^. To wrap up, 
recall that 6i < 62s <0.1. □ 

Next we discuss other variations of the algorithm. 

Here is a version of the algorithm that is, perhaps, simpler than that described 
above. At each iteration, we approximate the current residual rather than the entire 
signal. This approach is similar to HHS pursuit [32]. The inner loop changes in the 
following manner. 

Identification As before, select Q = supp?/2s- 

Estimation Solve a least-squares problem with the current samples instead of the 
original samples to obtain an approximation of the residual signal. Formally, 
b = . In this case, one initializes the iterative least-squares algorithm with 
the zero vector to take advantage of the fact that the residual is becoming small. 

Approximation Merger Add this approximation of the residual to the previous 
approximation of the signal to obtain a new approximation of the signal: c = 

a'^-i + b. 

Pruning Construct the s-sparse signal approximation: a'^ = Cg. 

Sample Update Update the samples as before: v = u — (^a^. 

By adapting the argument in this paper, we have been able to show that this 
algorithm also satisfies Theorem 13.2.11 We believe this version is quite promising for 
applications. 



3.2. Compressive Sampling Matching Pursuit 112 

An alternative variation is as follows. After the inner loop of the algorithm is 
complete, we can solve another least-squares problem in an effort to improve the 
final result. If a is the approximation at the end of the loop, we set T = suppa. Then 
solve b = ^^pU and output the s-sparse signal approximation b. Note that the output 
approximation is not guaranteed to be better than a because of the noise vector e, 
but it should never be much worse. 

Another variation is to prune the merged support T down to s entries before 
solving the least-squares problem. One may use the values of the proxy y as surrogates 
for the unknown values of the new approximation on the set Q. Since the least-squares 
problem is solved at the end of the iteration, the columns of $ that are used in the 
least-squares approximation are orthogonal to the current samples v. As a result, 
the identification step always selects new components in each iteration. We have not 
attempted an analysis of this algorithm. 

3.2.5 Numerical Results 

Noiseless Numerical Studies 

This section describes our experiments that illustrate the signal recovery power of 
CoSaMP. See Section IA.4I for the Matlab code used in these studies. We experi- 
mentally examine how many measurements m are necessary to recover various kinds 
of s-sparse signals in M'^ using ROMP. We also demonstrate that the number of it- 
erations CoSaMP needs to recover a sparse signal is in practice at most linear the 
sparsity, and in fact this serves as a successful halting criterion. 

First we describe the setup of our experiments. For many values of the ambient 
dimension d, the number of measurements m, and the sparsity s, we reconstruct 
random signals using CoSaMP. For each set of values, we generate an m x Gaussian 
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measurement matrix $ and then perform 500 independent trials. The results we 
obtained using Bernoulli measurement matrices were very similar. In a given trial, 
we generate an s-sparse signal x in one of two ways. In either case, we first select the 
support of the signal by choosing s components uniformly at random (independent 
from the measurement matrix $). In the cases where we wish to generate flat signals, 
we then set these components to one. In the cases where we wish to generate sparse 
compressible signals, we set the i*^ component of the support to plus or minus 
for a specified value of < p < 1. We then execute CoSaMP with the measurement 
vector u = ^x. 

Figure 13.2.21 depicts the percentage (from the 500 trials) of sparse fiat signals that 
were reconstructed exactly. This plot was generated with d = 256 for various levels 
of sparsity s. The horizontal axis represents the number of measurements m, and the 
vertical axis represents the exact recovery percentage. 

Figure 13.2.31 depicts a plot of the values for m and s at which 99% of sparse fiat 
signals are recovered exactly. This plot was generated with d = 256. The horizontal 
axis represents the number of measurements m, and the vertical axis the sparsity 
level s. 

Our results guarantee that CoSaMP reconstructs signals correctly with just 0{s) 
iterations. Figure 13.2.41 depicts the number of iterations needed by CoSaMP for 
d = 10, 000 and m = 200 for perfect reconstruction. CoSaMP was executed under the 
same setting as described above for sparse fiat signals as well as sparse compressible 
signals for various values of p, and the number of iterations in each scenario was 
averaged over the 500 trials. These averages were plotted against the sparsity of the 
signal. The plot demonstrates that often far fewer iterations are actually needed in 
some cases. This is not surprising, since as we discussed above alternative halting 
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criteria may be better in practice. The plot also demonstrates that the number 
of iterations needed for sparse compressible is higher than the number needed for 
sparse flat signals, as one would expect. The plot suggests that for smaller values of 
p (meaning signals that decay more rapidly) CoSaMP needs more iterations. 



Percentage of input signals recovered exactly (d=256) (Gaussian) 




100 150 
Number of measurements (m) 



250 



Figure 3.2.2: The percentage of sparse fiat signals exactly recovered by CoSaMP as 
a function of the number of measurements in dimension d = 256 for various levels of 
sparsity. 



Noisy Numerical Studies 

This section describes our numerical experiments that illustrate the stability of CoSaMP. 
We study the recovery error using CoSaMP for both perturbed measurements and 
signals. The empirical recovery error confirms that given in the theorems. 

First we describe the setup to our experimental studies. We run CoSaMP on 
various values of the ambient dimension d, the number of measurements m, and the 
sparsity level s, and attempt to reconstruct random signals. For each set of param- 
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99% Recovery Trend, d=256 
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Figure 3.2.3: The 99% recovery hmit as a function of the sparsity and the number of 
measurements for sparse flat signals. 

eters, we perform 500 trials. Initially, we generate an m x d Gaussian measurement 
matrix $. For each trial, independent of the matrix, we generate an s-sparse signal x 
by choosing s components uniformly at random and setting them to one. In the case 
of perturbed signals, we add to the signal a ci-dimensional error vector with Gaussian 
entries. In the case of perturbed measurements, we add an m-dimensional error vec- 
tor with Gaussian entries to the measurement vector $x. We then execute ROMP 
with the measurement vector u = or m + e in the perturbed measurement case. 
After CoSaMP terminates (using a fixed number of iterations of 10s), we output the 
reconstructed vector x obtained from the least squares calculation and calculate its 
distance from the original signal. 

Figure 13.2.51 depicts the recovery error ||x — x\\2 when CoSaMP was run with 
perturbed measurements. This plot was generated with d = 256 for various levels of 
sparsity s. The horizontal axis represents the number of measurements m, and the 
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Number of Iterations needed to reconstruct (d=1 0,000, m=200) (Gaussian) 

lOr 




Sparsity Level (s) 

Figure 3.2.4: The number of iterations needed by CoSaMP as a function of the 
sparsity in dimension d = 10, 000 with 200 measurements. 

vertical axis represents the average normalized recovery error. 

Figure 13.2.61 depicts the normalized recovery error when the signal was perturbed 
by a Gaussian vector. Again these results are consistent with our proven theorems, 
but notice that we normalize here by — x<(||i/yi rather than — a;<(/2||i/v^ as our 
theorems suggest. These plots show that perhaps the former normalization factor is 
actually more accurate, and the latter may be a consequence of our analysis only. 



3.2.6 Summary 

CoSaMP draws on both algorithmic ideas and analytic techniques that have appeared 
before. Here we summarize the results in the context of other work. This discussion 
can also be found in [53]. The initial discovery works on compressive sampling pro- 
posed to perform signal recovery by solving a convex optimization problem [H [19] 
(see also Section [2711 above) . Given a sampling matrix $ and a noisy vector of sam- 
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Normalized Recovery Error for CoSaMP on perturbed measurements (d=256) 
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Figure 3.2.5: The error to noise ratio 




clS db function of the number of measure- 



ments m in dimension d = 256 for various levels of sparsity s. 

pies u = $a; + e with ||e||2 < e, we consider the mathematical program (12.1.51) . In 
words, we look for a signal reconstruction that is consistent with the samples but 
has minimal ii norm. The intuition behind this approach is that minimizing the ii 
norm promotes sparsity, so allows the approximate recovery of compressible signals. 
Candes, Romberg, and Tao established in [5] that a minimizer a of fl2.1.5p satisfies 



provided that the sampling matrix $ has restricted isometry constant 64s < 0.2. In 
[8], the hypothesis on the restricted isometry constant is sharpened to < — 1- 
The error bound for CoSaMP is equivalent, modulo the exact value of the constants. 

The literature describes a huge variety of algorithms for solving the optimization 
problem fl2.1.5l) . The most common approaches involve interior-point methods [U 




(3.2.10) 
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Normalized Recovery Error for CoSaMP on perturbed signals (d=256) 




75 100 125 150 175 200 225 250 
Number of measurements (m) 

Figure 3.2.6: The error to noise ratio |jJ|i!^~^^J|^^ using a perturbed signal, as a function 
of the number of measurements m in dimension d = 256 for various levels of sparsity 
s. 



projected gradient methods or iterative thresholding [TB] The interior-point 
methods are guaranteed to solve the problem to a fixed precision in time 0(m^(i^'^), 
where m is the number of measurements and d is the signal length [56j. Note that 
the constant in the big-0 notation depends on some of the problem data. The other 
convex relaxation algorithms, while sometimes faster in practice, do not currently 
offer rigorous guarantees. CoSaMP provides rigorous bounds on the runtime that are 
much better than the available results for interior-point methods. 

Tropp and Gilbert proposed the use of a greedy iterative algorithm called orthog- 
onal matching pursuit (OMP) for signal recovery [02] (see also Section [2.2.11 above). 
Tropp and Gilbert were able to prove a weak result for the performance of OMP 
[52] . Suppose that x is a fixed, s-sparse signal, and let m = Cs logs. Draw an m x s 
sampling matrix $ whose entries are independent, zero- mean subgaussian random 
variables with equal variances. Given noiseless measurements u = $x, OMP recon- 
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structs X after s iterations, except with probabihty s^^. In this setting, OMP must 
fail for some sparse signals so it does not provide the same uniform guarantees 
as convex relaxation. It is unknown whether OMP succeeds for compressible signals 
or whether it succeeds when the samples are contaminated with noise. 

Donoho et al. invented another greedy iterative method called stagewise OMP, or 
StOMP [23j (see also Section [2.2.21 above). This algorithm uses the signal proxy to 
select multiple components at each step, using a rule inspired by ideas from wireless 
communications. The algorithm is faster than OMP because of the selection rule, and 
it sometimes provides good performance, although parameter tuning can be difficult. 
There are no rigorous results available for StOMP. 

Needell and Vershynin developed and analyzed another greedy approach, called 
regularized OMP, or ROMP [55|, [51] (see also Section 13.11 above). The work on 
ROMP represents an advance because the authors establish under restricted isometry 
hypotheses that their algorithm can approximately recover any compressible signal 
from noisy samples. More precisely, suppose that the sampling matrix $ has restricted 
isometry constant < 0.01/v^logs. Given noisy samples u = $a; + e, ROMP 
produces a 2s-sparse signal approximation a that satisfies 

\\x — a 

II2 < C^/h^ 

This result is comparable with the result for convex relaxation, aside from the ex- 
tra logarithmic factor in the restricted isometry hypothesis and the error bound. 
The results for CoSaMP show that it does not suffer these parasitic factors, so its 
performance is essentially optimal. 



\\x 



+ e 
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3.3 Reweighted Ll-Minimization 

As discussed in Section 12.11 the £i-minimization problem (I2.1.ip is equivalent to 
the nonconvex problem f ll.l.2p when the measurement matrix $ satisfies a certain 
condition. Let us first state the best known theorem for recovery using ii, provided 
by Candes, in more detail. We will rely on this result to prove the new theorems. 

Theorem 3.3.1 (£i-minimization from [8]). Assume $ has < — 1- Let x 

be an arbitrary signal with noisy measurements + e, where ||e||2 < £■ Then the 
approximation x to x from £i-minimization satisfies 

\\x-xh<Ce + C'^^^^^p^, 

where C = ^, C = p = and a = . 

The key difference between the two problems of course, is that the £i formulation 
depends on the magnitudes of the coefficients of a signal, whereas the £o does not. To 
reconcile this imbalance, a new weighted £i-minimization algorithm was proposed by 
Candes, Wakin, and Boyd ^. This algorithm solves the following weighted version 
of (Li) at each iteration: 

d 

min y 6iXi subject to ^x = $£. (WLi) 

i=l 

It is clear that in this formulation, large weights 6i will encourage small coordinates 
of the solution vector, and small weights will encourage larger coordinates. Indeed, 
suppose the s-sparse signal x was known exactly, and that the weights were set as 



3.3. Reweighted Ll-Minimization 



121 



Notice that in this case, the weights are infinite at all locations outside of the support 
of X. This will force the coordinates of the solution vector x at these locations to 
be zero. Thus if the signal x is s-sparse with s < m, these weights would guarantee 
that X = X. Of course, these weights could not be chosen without knowing the actual 
signal X itself, but this demonstrates the positive effect that the weights can have on 
the performance of £i-minimization. 

The helpful effect of the weights can also be viewed geometrically. Recall that the 
solution to the problem (Li) is the contact point where the smallest £i-ball meets the 
subspace x + ker $. When the solution to (Li) does not coincide with the original 
signal X, it is because there is an £i-ball smaller than the one containing x, which 
meets the subspace x + ker The solution to problem (WLi), however, is the place 
where the weighted £i-ball meets the subspace. When the weights are appropriate, 
this is an £i-ball that has been pinched toward the signal x (see Figure 13.3.11) . This 
new geometry reduces the likelihood of the incorrect solution. 




Figure 3.3.1: Weighted £i-ball geometry (right) versus standard (left). 



Although the weights might not initially induce this geometry, one hopes that 
by solving the problem {WLi) at each iteration, the weights will get closer to the 
optimal values (13.3.11) . thereby improving the reconstruction of x. Of course, one 
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cannot actually have an infinite weight as in fl3.3.ip . so a stability parameter must 
also be used in the selection of the weight values. The reweighted £i-minimization 
algorithm can thus be described precisely as follows. 

Reweighted ^i-minimization 

Input: Measurement vector u G M*", stability parameter a 

Output: Reconstructed vector x 

Initialize Set the weights Si = 1 for i = 1 . . .d. 

Repeat the following until convergence or a fixed number of times: 

Approximate Solve the reweighted £i-minimization problem: 

d 



Remark 3.3.2. Note that the optional set of constraints given in the algorithm is 
only for the case in which the signal or measurements may be corrupted with noise. 
It may also be advantageous to decrease the stability parameter a so that a ^ as 
the iterations tend to infinity. See the proof of Theorem \3. 3.3\ below for details. 

In [7], the reweighted £i-minimization algorithm is discussed thoroughly, and 
experimental results are provided to show that it often outperforms the standard 
method. However, no provable guarantees have yet been made for the algorithm's 
success. Here we analyze the algorithm when the measurements and signals are 
corrupted with noise. Since the reweighted method needs a weight vector that is 




Update Reset the weights: 



1 



Xi\ + a 
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somewhat close to the actual signal x, it is natural to consider the noisy case since 
the standard £i -minimization method itself produces such a vector. We are able to 
prove an error bound in this noisy case that improves upon the best known bound 
for the standard method. We also provide numerical studies that show the bounds 
are improved in practice as well. 



The main theorem of this paper guarantees an error bound for the reconstruction 
using reweighted £i-minimization that improves upon the best known bound of The- 
orem [3311] for the standard method. For initial simplicity, we consider the case where 
the signal x is exactly sparse, but the measurements u are corrupted with noise. Our 
main theorem. Theorem 13.3.31 will imply results for the case where the signal x is 
arbitrary. 

Theorem 3.3.3 (Reweighted ii, Sparse Case). Assume $ satisfies the restricted 
isometry condition with parameters {2s, 5) where 6 < \pl — 1. Let x he an s-sparse 
vector with noisy measurements u = $x + e where ||e||2 < s. Assume the smallest 
nonzero coordinate fi of x satisfies /i > yz| ■ Then the limiting approximation from 
reweighted ii-minimization satisfies 



3.3.1 Main Results 



X — x||2 < C"e, 



where C' 



III 




I and a 



1-5 



Remarks. 
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1. We actually show that the reconstruction error satisfies 



2(xe 



(3.3.2) 



X — a;||2 < 



1 + 



This bound is stronger than that given in Theorem 13.3.31 which is only equal to this 



simpler and clearly shows the role of the parameter b by the use of p. 

2. For signals whose smallest non-zero coefficient /x does not satisfy the condition 
of the theorem, we may apply the theorem to those coefficients that do satisfy this 
requirement, and treat the others as noise. See Theorem 13.3.41 below. 

3. Although the bound in the theorem is the limiting bound, we provide a recur- 
sive relation fl3.3.9p in the proof which provides an exact error bound per iteration. 
In Section [3. 3. 31 we use dynamic programming to show that in many cases only a very 
small number of iterations are actually required to obtain the above error bound. 

We now discuss the differences between Theorem 13.3.11 and our new result Theo- 
rem [3]3]3l In the case where b nears its limit of — 1, the constant p increases to 1, 
and so the constant C in Theorem 13.3.11 is unbounded. However, the constant C" in 
Theorem 13.3.31 remains bounded even in this case. In fact, as b approaches \pl — 1, 
the constant C" approaches just 4.66. The tradeoff of course, is in the requirement 
on /i. As b gets closer to \pi — 1, the bound needed on p requires the signal to have 
unbounded non-zero coordinates relative to the noise level e. However, to use this 
theorem efficiently, one would select the largest b < \pl — 1 that allows the require- 
ment on p to be satisfied, and then apply the theorem for this value of b. Using this 
strategy, when the ratio ^ = 10, for example, the error bound is just 3.85£:. 

Theorem 13 . 3 . 3 1 and a short calculation will imply the following result for arbitrary 



bound when p nears the value However, the form in Theorem 13.3.31 is much 
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signals x. 

Theorem 3.3.4 (Reweighted Assume $ satisfies the restricted isometry condi- 
tion with parameters (2s, — 1). Let x be an arbitrary vector with noisy measure- 
ments u = $x + e where ||e||2 < s. Assume the smallest nonzero coordinate fi of Xg 
satisfies /i > where Eq = 1.2(||a; — Xs\\2 + — Xs\\i) + e. Then the limiting 
approximation from reweighted ii-minimization satisfies 

4.1a f\\x - Xs/2\\i , \ 

k-a; 2 < -r~[ 7= + ^ ' 

1 + p V V s / 

and 

2Aa ( llx — \ 
F - 3; 2 < — — F - 2 H p He, 

where p and a are as in Theorem \3. 3.3[ 

Again in the case where S nears its bound of \/2 — 1, both constants C and C in 
Theorem 13.3.11 approach infinity. However, in Theorem 13.3.41 the constant remains 
bounded even in this case. The same strategy discussed above for Theorem 13.3.31 
should also be used for Theorem I3.3.4[ Next we begin proving Theorem 13.3.31 and 
Theorem 13.3.41 

3.3.2 Proofs 

We will first utilize a lemma that bounds the £2 norm of a small portion of the 
difference vector a; — x by the £i-norm of its remainder. This lemma is proved in [8] 
and essentially in [5] as part of the proofs of the main theorems of those papers. We 
include a proof here as well since we require the final result as well as intermediate 
steps for the proof of our main theorem. 
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Lemma 3.3.5. Set h = x ~ x, and let a, e, and p be as in Theorem \3. 3.3[ Let Tq be 

the set of s largest coefficients in magnitude of x and Ti be the s largest coefficients 
of . Then 

II^TouTilb <ae + -^\\hT-\\i. 

Proof. Continue defining sets Tj by setting Ti to be the s largest coefficients of Ht^, 
T2 the next s largest coefficients of Ht^, and so on. We begin by applying the triangle 
inequality and using the fact that x itself is feasible in ( [VrLiD . This yields 



||$/i||2 < - u\\2 + - u\\2 < 2e. (3.3.3) 

By the decreasing property of these sets and since each set Tj has cardinality at 
most s, we have for each j > 2, 



Summing the terms, this gives 



J2\\hT,h<—J\hTs\\i- (3.3.4) 
i>2 



By the triangle inequality, we then also have 



|^(ToUTi)':||2 < ^II^T=||l- (3.3.5) 

V -5 



Now by linearity we have 



i>2 
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K*/iToUTi,$/i)| < ll^/iToUrilbll^/^lb < 2£Vl + 5||/iToUTi||2. 

As shown in Lemma 2.1 of [8], the restricted isometry condition and parallelogram 
law imply that for j > 2, 

|($/tTo,*H)| - ^W^f^TohW^hr^y and $/iT,)| < SW^hrM^hT^y. 

Since all sets Tj are disjoint, the above three inequalities yield 

{I - S)\\hT,uTA\l < ||$HuTxll2< WhTouT.hi^eVTTS + V26Y,\\hT,h)- 

i>2 

Therefore by (13.3.41) . we have 

□ 

We will next require two lemmas that give results about a single iteration of 
reweighted £i-minimization. 

Lemma 3.3.6 (Single reweighted £i-minimization). Assume $ satisfies the restricted 
isometry condition with parameters (2s, \/2 — 1). Let x be an arbitrary vector with 
noisy measurements u = $x + e where ||e||2 < e. Let w be a vector such that 
\\w — a;||oo ^ A for some constant A. Denote by Xg the vector consisting of the 
s (where s < |supp(a;)|j largest coefficients of x in absolute value. Let fi be the 
smallest coordinate of Xg in absolute value, and set b = \\x — Xs||oo- Then when 
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fi > A and pCi < 1, the approximation from reweighted ii-minimization using weights 
6i = 1/ {wi + a) satisfies 

\\x - xlU < Die + D2— — 

a 

,nhpr-P n - (l+<^l)" n _ r I (l+gi)pC2 (-i _ A+a+b ^ _ 2{A+a+b) , , 

Where Ui - ^z^; 1^2 - 02 + i^^^i ' '-^i " ]I::aT^' - — ^75 — , and p and a 
are as in Theorem \3. 3.31 

Proof of Lemma \3. 3. (A Set h and Tj for j > as in Lemma 13.3.51 For simplicity, 
denote by || • H^^, the weighted £i-norm: 



d 

dcf 

\Wi\ + a 



d 



Since x = a; + is the minimizer of we have 



This yields 

Next we relate the reweighted norm to the usual £i-norm. We first have 

|^T,f 111 



^T,?|L > 



A + a + 6' 



by definition of the reweighted norm as well as the values of A, a, and h. Similarly 
we have 

II 7^ II < "^^""^ 
II "-To II w ^ 



— A + a 
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Combining the above three inequahties, we have 



IIK^IIi < {A + a + b)\\hTc\\yj < {A + a + b){\\hTj^ + 2\\xT§\\n^) 

A + a + b 



< 



\\hT,\\i + 2{A + a + b)\\xT^4.^. (3.3.6) 



II — A + a 

Using fl3.3.5p and fl3.3.6p along with the fact ||/;.ToIIi ^ V^II^Tolb, we have 

||^(TouTi)<=||2 < CiWhroh + C2\\xt-\\uj, (3.3.7) 
where Ci = ^^^'^^^ and C2 = H(;l±|t^. By Lemma [3.3.5[ we have 

II^TouTilh <ae + -^\\hTc\\i, 
where p = and a = Thus by flSXBj) . we have 

WhTounh < ae+-^{Ci\\hTo\\i+2{A+a+b)\\xT^';\\w) = <y£+pCi\\hTouTA\2+pC2\\xT§\\w 
Therefore, 

\\hTouTA2<il~pCi)-\ae + pC2\\xTs\U. (3.3.8) 
Finally by (l3X7Il and (EASD, 



2 < ||/iToUTi II2 + ||/''(ToUTi)<=||2 

< (1+Ci)||/IToUtJ|2 + C2|| Xt§ \\w 

< (1 + Ci)((l - pCi)-\ae + pC2\\xTs\U)) + C2\\xts 



Applying the inequality HxTq^^Hu, < (l/a)||xT'Q':||i and simplifying completes the claim. 
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□ 

Lemma 3.3.7 (Single reweighted £i-minimization, Sparse Case). Assume $ satisfies 
the restricted isometry condition with parameters (2s, — 1). Let x be an s-sparse 
vector with noisy measurements u = $x + e where ||e||2 < s. Let w be a vector such 
that \\w ~ x\\oo < A for some constant A. Let fi be the smallest non-zero coordinate 
of X in absolute value. Then when jj, > A, the approximation from reweighted £i- 
minimization using weights 6i = 1/ {wi + a) satisfies 

\\x — x\\2 < Die. 

Here Di = , Ci = ^° ^ , and a and p are as in Theorem \3. 3.3\ . 

Proof. This is the case of Lemma [3.3.61 where x — Xg = and 6 = 0. □ 

Proof of Theorem \3. 3.3[ The proof proceeds as follows. First, we use the error bound 
in Theorem 13.3.11 as the initial error, and then apply Lemma 13.3.71 repeatedly. We 
show that the error decreases at each iteration, and then deduce its limiting bound 
using the recursive relation. To this end, let E{k) for A; = 1, . . ., be the error bound 
on ||a; — Sfc||2 where Xk is the reconstructed signal at the k^^ iteration. Then by 
Theorem 13.3.11 and Lemma 13.3.71 we have the recursive definition 

E{l) = ^e, E{k + l)= \ ^-g:^; (3.3.9) 



Here we have taken a ^ iteratively (or if a remains fixed, a small constant 0(a) 
will be added to the error). First, we show that the base case holds, E{1) < E{2). 



3.3. Reweighted Ll-Minimization 131 
Since /i > we have that 

E(l) _ S <i. 



Therefore we have 



(1 + -^)q; 2a 

Next we show the inductive step, that E{k-\-l) < E{k) assuming the inequahty holds 
for all previous k. Indeed, if E{k) < E{k — 1), then we have 



1^ H-E{k) ^ 1^ n-E{k-l) 

Since // > |^ and p < 1 we have that // - > and p^^^^ < 1, so is 
also bounded below by zero. Thus E(k) is a bounded decreasing sequence, so it must 
converge. Call its limit L. By the recursive definition of E{k), we must have 

(l + -^.)a 

Solving this equation yields 

H — \/ — 4:nae — A/iaep 



2(1 + P) 



where we choose the solution with the minus since E{k) is decreasing and £'(1) < ///2 
(note also that L = when 6 = 0). Multiplying by the conjugate and simplifying 
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yields 



2(1 + p) (/^ + v^/x^ - Apae - Aixaep) i + _ l^e _ 4aep" 
Then again since /i > we have 



2ae 

L < 



1 + P 



□ 



Proof of Theorem \3. 3.4\ By Lemma 6.1 of |53] and Lemma 7 of [32], we can rewrite 



$x + e as + e where 



|e||2 < 1.2(||x - a;,||2 + - x,||i) + ||e||2 < 2m(— — ) + ||e||2. 



This combined with Theorem 13.3.31 completes the claim. □ 
3.3.3 Numerical Results and Convergence 

Our main theorems prove bounds on the reconstruction error limit. However, as is 
the case with many recursive relations, convergence to this threshold is often quite 
fast. To show this, we use dynamic programming to compute the theoretical error 
bound E{k) given by f l3.3.9p and test its convergence rate to the threshold given 
by eqrefactualbnd. Since the ratio between /i and e is important, we fix /i = 10 and 
test the convergence for various values of e and 5. We conclude that the error bound 
has achieved the threshold when it is within 0.001 of it. The results are displayed in 
Figure 133:21 
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Figure 3.3.2: The number of iterations required for the theoretical error 
bounds eqrefEk to reach the theoretical error threshold (13.3.21) when (a) /i = 10, 
e = 0.01, (b) /i = 10, e = 0.1, (c) = 10, e = 0.5, (d) /i = 10, e = 1.0. 

We observe that in each 5 increases we require slightly more iterations. 

This is not surprising since higher 6 means a lower bound. We also confirm that less 
iterations are required when the ratio /i/e is smaller. 

Next we examine some numerical experiments conducted to test the actual error 
with reweighted £i-minimization versus the standard li method. In these experiments 
we consider signals of dimension d = 256 with s = 30 non-zero entries. We use a 128 x 
256 measurement matrix $ consisting of Gaussian entries. We note that we found 
similar results when the measurement matrix $ consisted of symmetric Bernoulli 
entries. Sparsity, measurement, and dimension values in this range demonstrate 
a good example of the advantages of the reweighted method. Our results above 
suggest that improvements are made using the reweighted method when 6 cannot be 
forced very small. This means that in situations with sparsity levels s much smaller 
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or measurement levels m much larger, the reweighted method may not offer much 
improvements. These levels are not the only ones that show improvements, however, 
see for example those experiments in [7j. 

For each trial in our experiments we construct an s-sparse signal x with uniformly 
chosen support and entries from either the Gaussian distribution or the symmetric 
Bernoulli distribution. All entries are chosen independently and independent of the 
measurement matrix $. We then construct the noise vector e to have independent 
Gaussian entries, and normalize the vector to have norm a fraction (1/5) of the 
noiseless measurement vector $x norm. We then run the reweighted £i-algorithm 
using e such that = cr^(m + 2\/2m) where o"^ is the variance of the normalized 
error vectors. This value is likely to provide a good upper bound on the noise norm 
(see e.g. |5], [7]). The stability parameter a tends to zero with increased iterations. 
We run 500 trials for each parameter selection and signal type. We found similar 
results for non-sparse signals such as noisy sparse signals and compressible signals. 
This is not surprising since we can treat the signal error as measurement error after 
applying the the measurement matrix (see the proof of Theorem 13.3.41) . 

Figure [3.3.31 depicts the experiment with Gaussian signals and decreasing stability 
parameter a. In particular, we set a = l/(1000fc) in the k^^ iteration. The plot 
(left) depicts the error after a single £i-minimization and after 9 iterations using the 
reweighted method. The histogram (right) depicts the improvements ||x — x||2/||a; — 
x*\\2 where x and x* are the reconstructed vectors after 9 iterations of reweighted 
and a single £i-minimization, respectively. 

Figure 13.3.41 depicts the same results as Figure 13.3.31 above, but for the exper- 
iments with Bernoulli signals. We see that in this case again reweighted £i offers 
improvements. In this setting, the improvements in the Bernoulli signals seem to be 
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Figure 3.3.3: Improvements in the £2 reconstruction error using reweighted £1- 
minimization versus standard £i-minimization for Gaussian signals. 

shghtly less than in the Gaussian case. It is clear that in the case of flat signals like 
Bernoulli, the requirement on /i in Theorem 13.3.31 may be easier to satisfy, since the 
signal will have no small components unless they are all small. However, in the case of 
flat signals, if this requirement is not met, then the Theorem guarantees no improve- 
ments whatsoever. In the Gaussian case, even if the requirement on fi is not met, the 
Theorem still shows that improvements will be made on the larger coefficients. 



Reconstruction error for d=256 m=128 s=30 with d ecreasino e 

One iteration 




Improvement by reweighted LI after 9 iterations for d=256 m=128 s=30 with decreasing e 



200 300 
Trials 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
Improvement factor 



Figure 3.3.4: Improvements in the £2 reconstruction error using reweighted £1- 
minimization versus standard £i-minimization for Bernoulli signals. 
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Although the algorithms in compressed sensing themselves are not random, the mea- 
surement matrices used in the compression step are. The suggestion that randomness 
often makes things easier is the key to the idea of randomized algorithms. It is often 
the case that a deterministic algorithm's flaw can be fixed by introducing randomness. 
This notion is at the heart of a new randomized version of the well known Kaczmarz 
algorithm. Although the work on Kaczmarz is outside the realm on compressed 
sensing, it does bare some striking similarities. 

The Kaczmarz method [41| is one of the most popular solvers of overdetermined 
linear systems and has numerous applications from computer tomography to image 
processing. The algorithm consists of a series of alternating projections, and is often 
considered a type of Projection on Convex Sets (POCS) method. Given a consistent 
system of linear equations of the form 

Ax = 6, 

the Kaczmarz method iteratively projects onto the solution spaces of each equation 
in the system. That is, if Oil, ... , G M" denote the rows of A, the method cycli- 
cally projects the current estimate orthogonally onto the hyperplanes consisting of 
solutions to {ai,x) = h^. Each iteration consists of a single orthogonal projection. 
The algorithm can thus be described using the recursive relation, 

_ hi - {auXk) 

^k+l — ~r II ||2 Oii-i 

II '^i II 2 

where x^ is the fc*^ iterate and i = {k mod m) + 1. 
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Theoretical results on the convergence rate of the Kaczmarz method have been 
difficult to obtain. Most known estimates depend on properties of the matrix A 
which may be time consuming to compute, and are not easily comparable to those 
of other iterative methods (see e.g. [18], [29], [37]). Since the Kaczmarz method 
cycles through the rows of A sequentially, its convergence rate depends on the order 
of the rows. Intuition tells us that the order of the rows of A does not change the 
difficulty level of the system as a whole, so one would hope for results independent of 
the ordering. One natural way to overcome this is to use the rows of A in a random 
order, rather than sequentially. Several observations were made on the improvements 
of this randomized version [501 EH], but only recently have theoretical results been 
obtained [6T] . 

In designing a random version of the Kaczmarz method, it is necessary to set 
the probability of each row being selected. Strohmer and Vershynin propose in [61] 
to set the probability proportional to the Euclidean norm of the row. Their revised 
algorithm can then be described by the following: 

■i'k+l — ■i'k ~\ 11 

Il^p(j)ll2 

\\a ■ IP 

where p{i) takes values in {1, ... , m} with probabilities -plp^- Here and throughout, 
||y4||^ denotes the Frobenius norm of A and || ■ II2 denotes the usual Euclidean norm or 
spectral norm for vectors or matrices, respectively. We note here that of course, one 
needs some knowledge of the norm of the rows of A in this version of the algorithm. 
In general, this computation takes 0{mn) time. However, in many cases such as 
the case in which A contains Gaussian entries, this may be approximately or exactly 
known. 
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In [6T], Strohmer and Vershynin prove the following exponential bound on the 

expected rate of convergence for the randomized Kaczmarz method, 



where i?=||y4 ^P||v4|||n and xq is an arbitrary initial estimate. Here and throughout, 
\\A-^\\ = inf{M : M\\Ax\\2 > \\x\\2 for all x}. 

The first remarkable note about this result is that it is essentially independent 
of the number m of equations in the system. Indeed, by the definition of R, R is 
proportional to n within a square factor of the condition number of A. Also, since 
the algorithm needs only access to the randomly chosen rows of A, the method need 
not know the entire matrix A. Indeed, the bound (13.4. ip and the relationship of R to 
n shows that the estimate x^ converges exponentially fast to the solution in just 0(n) 
iterations. Since each iteration requires 0(n) time, the method overall has a O(n^) 
runtime. Thus this randomized version of the algorithm provides advantages over all 
previous methods for extremely overdetermined linear systems. There are of course 
situations where other methods, such as the conjugate gradient method, outperform 
the randomized Kaczmarz method. However, numerical studies in [61] show that in 
many scenarios (for example when A is Gaussian), the randomized Kaczmarz method 
provides faster convergence than even the conjugate gradient method. 

The remarkable benefits provided by the randomized Kaczmarz algorithm lead one 
to question whether the method works in the more realistic case where the system 
is corrupted by noise. In this paper we provide theoretical and empirical results to 
suggest that in this noisy case the method converges exponentially to the solution 
within a specified error bound. The error bound is proportional to VR, and we also 
provide a simple example showing this bound is sharp in the general setting. 




(3.4.1) 



3.4. Randomized Kaczmarz 



139 



3.4.1 Main Results 

In this section we discuss the results by Needell in |51] . 

Theoretical and empirical studies have shown the randomized Kaczmarz algorithm 
to provide very promising results. Here we show that it also performs well in the case 
where the system is corrupted with noise. In this section we consider the system 
Ax = b after an error vector r is added to the right side: 



First we present a simple example to gain intuition about how drastically the noise 
can affect the system. To that end, consider the n x n identity matrix A. Set 6 = 0, 
X = 0, and suppose the error is the vector whose entries are all one, r = (1, 1, . . . , 1). 
Then the solution to the noisy system is clearly r itself, and so by (13.4. ip . the iterates 
Xk of randomized Kaczmarz will converge exponentially to r. Since A is the identity 
matrix, we have R = n. Then by (13.4.11) and Jensen's inequality, we have 



Ax ^ b + r. 




1 N fc/2 



Then by the triangle inequality, we have 




Finally by the definition of r and R, this implies 
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This means that the hmiting error between the iterates Xk and the original solution 
X is \/R. In [61j it is shown that the bound provided in (13.4.11) is optimal, so if we 
wish to maintain a general setting, the best error bound for the noisy case we can 
hope for is proportional to \/R. Our main result proves this exact theoretical bound. 

Theorem 3.4.1 (Noisy randomized Kaczmarz). Let he the k*^ iterate of the noisy 
Randomized Kaczmarz method run with Ax ~ b + r, and let ai, . . . am denote the rows 
of A. Then we have 

Ellx* -x||2 < (^1 - -j ||a;o||2 + Vi?7, 
where R = llA'-^W^WAW^ and 7 = maxj irN-- 

II II II Wf I I ||ai||2 

Remark 3.4.2. In the case discussed above, note that we have 7 = 1, so the example 
indeed shows the bound is sharp. 

Before proving the theorem, it is important to first analyze what happens to the 
solution spaces of the original equations Ax = b when the error vector is added. 
Letting ai, . . . denote the rows of A, we have that each solution space (a^, x) = bi 
of the original system is a hyperplane whose normal is When noise is added, 

each hyperplane is translated in the direction of a^. Thus the new geometry consists 
of hyperplanes parallel to those in the noiseless case. A simple computation provides 
the following lemma which specifies exactly how far each hyperplane is shifted. 

Lemma 3.4.3. Let Hi be the affine subspace of consisting of the solutions to 
{ai,x) = bi. Let H* be the solution space of the noisy system {ai,x) = bi + Vi. Then 
H* = {w + ttiQi : w G Hi] where ai = jr-kri. 

Proof. First, ifwE Hi then {ai,w + aai) = (aj, w) + a;||aj||| = bi+Vi, so w+aai G H*. 
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Next let u G H*. Set w = u — aai. Then (aj, w) = {ai, u) — Vi = bi + Vi — Vi = bi, so 
w G H*. This completes the proof. □ 

We will also utilize the following lemma which is proved in the proof of Theorem 
2 in iU. 



Lemma 3.4.4. Let x*f,_-^ he any vector in M" and let Xk he its orthogonal projection 
onto a random solution space as in the noiseless randomized Kaczmarz method run 
with Ax = b. Then we have 



where R= \\A-Y\\Ml- 



We are now prepared to prove Theorem 13.4. 1[ 



Proof of Theorem \3.4-i\ Let xl_i denote the {k — l)*'^ iterate of noisy Randomized 
Kaczmarz. Using notation as in Lemma 13.4.31 let H* be the solution space chosen 
in the k^^ iteration. Then xl is the orthogonal projection of xl__i onto H*. Let Xk 
denote the orthogonal projection of x^^^^ onto Hi (see Figure [3".4.ip . 




Figure 3.4.1: The parallel hyperplanes Hi and H* along with the two projected 
vectors Xk and xl- 
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By Lemma 13.4.31 and the fact that is orthogonal to Hi and H* , we have that 
x\—x = Xk—x+Uiai. Again by orthogonahty, we have x||2 = x||2 + ||aiai||2- 
Then by Lemma [3.4.41 and the definition of 7, we have 



n^i-x\\i< {i-}^\\xi_,-x\\i+^\ 



where the expectation is conditioned upon the choice of the random selections in the 
first k — 1 iterations. Then applying this recursive relation iteratively and taking full 
expectation, we have 



E||4 - x\\l < (1 - Vo - x\\l + ^ (^1 - 1^' 
< (l-l)'||xo-a;||^ + i?7'- 



1' 



By Jensen's inequality we then have 

Mxl-x\\2< [(l- -j \\xq-x\\1 + R-i^\ <(l--J ||xo-x||2 + Vi?7- 
This completes the proof. □ 



3.4.2 Numerical Examples 

In this section we describe some of our numerical results for the randomized Kacz- 
marz method in the case of noisy systems. The results displayed in this section use 
matrices with independent Gaussian entries. Figure 13.4.21 depicts the error between 
the estimate by randomized Kaczmarz and the actual signal, in comparison with the 
predicted threshold value. This study was conducted for 100 trials using 100 x 50 
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Gaussian matrices and independent Gaussian noise. The systems were homogeneous, 
meaning x = and 6 = 0. The thick hne is a plot of the threshold value, 'y^/R for 
each trial. The thin line is a plot of the error in the estimate after 1000 iterations 
for the corresponding trial. As is evident by the plots, the error is quite close to the 
threshold. Of course in the Gaussian case depicted, it is not surprising that often the 
error is below the threshold. As discussed above, the threshold is sharp for certain 
kinds of matrices and noise vectors. 

Error in estimation: Gaussian 100 by 50 after 1000 iterations. 

0.18r 

Error 

0.16 - 

threshold 

0.14 - 

0.12 - 

^ 0.1 - 
o 




Q\ : , , : , 

20 40 60 80 100 

Trials 



Figure 3.4.2: The comparison between the actual error in the randomized Kaczmarz 
estimate (thin line) and the predicted threshold (thick line). 

Our next study investigated the convergence rate for the randomized Kaczmarz 
method with noise for homogeneous systems. Again we let our matrices A be 100 x 50 
Gaussian matrices, and our error vector be independent Gaussian noise. Figure 13.4.31 
displays a scatter plot of the results of this study over various trials. It is not sur- 
prising that the convergence appears exponential as predicted by the theorems. 
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Error in estimation: Gaussian 100 by 50 

1.2 r 
1 o 

0.8 - i 




Iterations 

Figure 3.4.3: Convergence rate for randomized Kaczmarz over various trials. 
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Chapter 4 
Summary 



Compressed sensing is a new and fast growing field of applied mathematics that ad- 
dresses the shortcomings of conventional signal compression. Given a signal with 
few nonzero coordinates relative to its dimension, compressed sensing seeks to recon- 
struct the signal from few nonadaptive linear measurements. As work in this area 
developed, two major approaches to the problem emerged, each with its own set of 
advantages and disadvantages. The first approach, Ll-Minimization [El [5], provided 
strong results, but lacked the speed of the second, the greedy approach. The greedy 
approach, while providing a fast runtime, lacked stability and uniform guarantees. 
This gap between the approaches has led researchers to seek an algorithm that could 
provide the benefits of both. In collaboration, my adviser Roman Vershynin and I 
have bridged this gap and provided a breakthrough algorithm, called Regularized 
Orthogonal Matching Pursuit (ROMP) [551 [5l]- ROMP is the first algorithm to pro- 
vide the stability and uniform guarantees similar to those of Ll-Minimization, while 
providing speed as a greedy approach. After analyzing these results, my colleague 
Joel Tropp and I developed the algorithm Compressive Sampling Matching Pursuit 
(CoSaMP), which improved upon the guarantees of ROMP [551 15^ . CoSaMP is the 
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first algorithm to have provably optimal guarantees in every important aspect. 

It was the negative opinions about conventional signal compression that spurred 
the development of compressed sensing. The traditional methodology is a costly 
process, which acquires the entire signal, compresses it, and then throws most of the 
information away. However, new ideas in the field combine signal acquisition and 
compression, significantly improving the overall cost. The problem is formulated as 
the realization of such a signal x from few linear measurements, of the form $x where 
$ is a (usually random) measurement matrix. Since Linear Algebra clearly shows 
that recovery in this fashion is not possible, the domain from which the signal is 
reconstructed must be restricted. Thus the domain that is considered is the domain 
of all sparse vectors. In particular, we call a signal s-sparse when it has s or less 
nonzero coordinates. It is now well known that many signals in practice are sparse 
in this sense or with respect to a different basis. 

As discussed, there are several critical properties that an ideal algorithm in com- 
pressed sensing should possess. The algorithm clearly needs to be efficient in practice. 
This means it should have a fast runtime and low memory requirements. It should 
also provide uniform guarantees so that one measurement matrix works for all signals 
with high probability. Lastly, the algorithm needs to provide stability under noise in 
order to be of any use in practice. The second approach uses greedy algorithms and 
is thus quite fast both in theory and in practice, but lacks both stability and uniform 
guarantees. The first approach relies on a condition called the Restricted Isometry 
Property (RIP), which had never been usefully apphed to greedy algorithms. For a 
measurement matrix $, we say that $ satisfies the RIP with parameters {s,e) if 

(l-£)||t^||2 < ll^^^lh < (1+£)||^;||2 
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holds for all s-sparse vectors v. We analyzed this property in a unique way and found 
consequences that could be used in a greedy fashion. Our breakthrough algorithm 
ROMP is the first to provide all these benefits (stability, uniform guarantees, and 
speed), while CoSaMP improves upon these significant results and provides com- 
pletely optimal runtime bounds. 

One of the basic greedy algorithms which inspired our work is Orthogonal Match- 
ing Pursuit (OMP), which was analyzed by Gilbert and Tropp [52] • OMP uses the 
observation vector u = to iteratively calculate the support of the signal x (which 

can then be used to reconstruct x). At each iteration, OMP selects the largest com- 
ponent of the observation vector u to be in the support, and then subtracts off its 
contribution. Although OMP is very fast, it does not provide uniform guarantees. 
Indeed, the algorithm correctly reconstructs a fixed signal x with high probability, 
rather than all signals. It is also unknown whether OMP is stable. 

Our new algorithm ROMP is similar in spirit to OMP, in that it uses the obser- 
vation vector u to calculate the support of the signal x. One of the key differences in 
the algorithm is that ROMP selects many coordinates of u at each iteration. The reg- 
ularization step of ROMP guarantees that each of the selected coordinates have close 
to an equal share of the information about the signal x. This allows us to translate 
captured energy of the signal into captured support of the signal. We show that when 
the measurement matrix $ satisfies the Restricted Isometry Property with parame- 
ters (2s, c/ log s), ROMP exactly reconstructs any s-sparse signal in just s iterations. 
Our stability results show that for an arbitrary signal x with noisy measurements 
^x + e, ROMP provides an approximation x to x that satisfies 




(4.0.1) 
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where Xg is the vector of the s largest coordinates of x in magnitude. ROMP is thus 
the first greedy algorithm providing the strong guarantees of the Ll-Minimization 
approach, and bridges the gap between the two approaches. 

After the breakthrough of ROMP, Needell and Tropp developed a new algorithm, 
Compressive Sampling Matching Pursuit (CoSaMP) [53l [52] . CoSaMP maintains 
an estimation of the support as well as an estimation of the signal throughout each 
iteration. It again is a greedy algorithm that provides the uniform and stability 
guarantees of the LI approach. CoSaMP improves upon the stability bounds of 
ROMP as well as the number of measurements required by the algorithm. Indeed we 
show that for any measurement matrix satisfying the Restricted Isometry Property 
with parameters (2s, c), CoSaMP approximately reconstructs any arbitrary signal x 
from its noisy measurements u = ^x + e in at most 6s iterations: 



We also provide a rigorous analysis of the runtime, which describes how exactly 
the algorithm should be implemented in practice. CoSaMP thus provides optimal 
guarantees at every important aspect. 
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Appendix A 



Matlab Code 



This section contains original Matlab code used to produce the figures contained 
within this work. 



A.l Basis Pursuit 

°/„NAME: Basis Pursuit Tester 

"/oPURPOSE: Tests sparse, noisy, compressible signals on Basis Pursuit 
"/.AUTHOR: Deanna Needell 

"/.OUTSIDE FUNCTIONS: lleq.pd (Ll-Magic, J. Romberg) 

clear all 
warning off all 

"/.Variables 
N=[10:5:250] ; 
d=[256] ; 
n= [2 : 2 : 90] ; 

p=[0.2:0.2: 1] ; "/.Used for compressible signals 

"/.Number of Trials for each parameter set 
numTrials = 500; 



A.l. Basis Pursuit 



%Counters 

numN = size(N, 2); 

numd = size(d, 2) ; 

numn = size(n, 2); 

nump = size(p, 2) ; 

°/oData Collection 

numCorr = zeros (numN, numd, numn); 
minMeas = zeros (numN) ; 

AvgerrorNormd = zeros (numN, numd, numn, nump); 
Avgerror = zeros (numN, numd, numn, nump); 
%for ip:=l:nump '/.USED FOR COMPRESSIBLE SIGNALS 
for id = l:numd 
for iN=l:numN 
in=l; 
done=0 ; 

while in <= numn && "done 
for trial = l:numTrials 
tN = N(l, iN); 
td = d(l, id); 
tn = n(l , in) ; 
tp = p(l, ip) ; 

"/oSet Matrix 
Phi = randn(tN, td) ; 
Phi = sign(Phi) ; 
I = zerosd , 1) ; 

y„Set signal 
z = randperm(td) ; 
V = zeros (td, 1) ; 
for t = l:tn 
v(z(t))=l; 

end 

'/.USED IN THE CASE OF COMPRESSIBLE SIGNALS 
°/oy = sign(randn(td, 1)); 
yonoiErr=0; 
"/of or i=l:tn 



%v(z(i)) = i^(-l/tp)*y(i); 
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%if i > tn 

7o noiErr = noiErr + abs(v(z(i))) ; 
°/oend 

y,end 

°/oSet measurement and residual 
X = Phi * v; 

y.USED IN THE CASE OF NOISE 

%e = randnCtN, 1); 

7oX = X + e / norm(e, 2) / 2; 

"/oSet initial estimate 
xO = Phi'*x; 

"/oRun Basis Pursuit (via Li-Magic) 
xp = lleq_pd(xO, Phi, [] , x, le-6) ; 

ZCollect Data 

if norm(xp-v, 2) < 0.01 

numCorr(iN, id, in) = numCorrCiN, id, in) +1; 

end 

AvgerrorNormd(iN, id, in, ip) = (AvgerrorNormd(iN, id, in, ip) * 
(counted-1) + (norm(xp-v,2) /noiErr* (tn) "0 . 5) )/counted; 

Avgerror(iN, id, in, ip) = (Avgerror (iN, id, in, ip) * 
(counted-1) + norm(xp-v, 2))/counted; 

end % end Trial 

if numCorr(iN, id, in) / numTrials > 0.98 

minMeas(iN) = tn; 
else 

done=l ; 

end 

in = in +1; 
end % n 
end % N 
end % d 
Zend % p 



A. 2. Orthogonal Matching Pursuit 



A. 2 Orthogonal Matching Pursuit 

"/oNAME: Orthogonal Matching Pursuit Tester 

%PURPOSE: Tests sparse signals on Orthogonal Matching Pursuit 
"/.AUTHOR: Deanna Needell 
"/.OUTSIDE FUNCTIONS: None 

clear all 
warning off all 
"/.Variables 
N=[10:5:250] ; 
d=[256] ; 
n= [2 : 2 : 90] ; 

numTrials = 500; 
numN = size(N, 2) ; 
numd = size(d, 2) ; 
numn = size(n, 2); 

"/.Data Collection 

numCorr = zeros (numN, numd, numn); 
mostSpars = zeros (numN); 

for id = l:numd 
for iN=l:numN 
in=l ; 
done=l ; 
keepGo=l ; 

while in <= numn && keepGo 
for trial = l:nuinTrials 

tN = N(l, iN); 
td = d(l, id); 
tn = n(l , in) ; 

y.Set Matrix 
Phi = randn(tN, td) ; 
Phi = sign (Phi) ; 
I = zerosd , 1) ; 

y.Set signal 

z = randperm(td) ; 
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supp=z(l : tn) ; 
V = zeros (td, 1) ; 
for t = l:tn 
v(z(t))=l; 

end 

X = Phi * v; 
r = x; 

y.Run OMP 

while length (I) -1 < tn 
u = Phi' * r; 
absu = abs (u) ; 

[b, ix] = sort (absu, 'descend'); 
bestind = ix(l) ; 
bestVal = b(l); 

y.Update I 

I(length(I)+l) = bestind; 

"/oUpdate the residual 
PhiSubl = Phi(:, 1(2)); 
for c=3 : length(I) 

if ~isMember(I(2:c-l) ,I(c)) 

PhiSubI(:,c-l) =Phi(:, 1(c)); 

end 

end 

y = lscov(PhiSubI , x) ; 
r = X - PhiSubl * y; 
end 

if isMember (supp, I); 

numCorrdN, id, in) = numCorrdN, id, in) +1; 

end 

end y„ end Trial 

if numCorrdN, id, in) / numTrials > 0.98 && done 
mostSpars(iN) = tn; 

else 

done=0; 

end 
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if numCorrCiN, id, in) <= 0.01 
keepGo=0 ; 

end 

in = in +1; 
end % n 
end % N 
end % d 



A. 3 Regularized Orthogonal Matching Pursuit 



function [vOut, numlts] = romp(n, Phi, x) 

% [vOut] = romp(n. Phi, x) 

Ifil^lfi PARAMETERS XAXAXAX^^^^X^^/^/^/^^^^X^X^ 

% d = ambient dimension of the signal v 

% N = number of measurements 

% n = sparsity level of n 

% Phi = N by d measurement matrix 

% X = measurement vector (Phi * v) 

% vOut = reconstructed signal 

0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/ 

/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/oA 



Til FUNCTION DESCRIPTION iiyixmxv/myxi 

"L romp takes parameters as described 

above. Given the sparsity level n and 
% the N by d measurement matrix Phi, and 
% the measurement vector x = Phi * v, romp 
% reconstructs the original signal v. 
% This reconstruction is the output. 

til y y til y y y til y y y til (ti <ii (ti <ii (ti <ii (ti <ii y yy y y y y til y til til y y y y y y y y y y y 



clear r I J JO u b ix numlts Jvals 
warning off all 



N = size(Phi, 1) ; 
d = size (Phi, 2) ; 

% Set residual 
r = x; 
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%Set index set to "empty" 
I = zeros (1,1) ; 

"/oCounter (to be used optionally) 
numlts = 0; 

"/oRun ROMP 

while length(I)-l < 2*n M norin(r) > 10~(-6) 
numlts = numlts + 1; 

7oFind J, the biggest n coordinates of u 
u = Phi' * r; 
absu = abs (u) ; 

[b, ix] = sort (absu, 'descend'); 
J = ix(l :n) ; 
Jvals = b(l :n) ; 

ZFind JO, the set of comparable coordinates with maximal energy 
w=l; 

best = -1; 

JO = zeros (1) ; 

while w <= n 

first = Jvals (w) ; 

firstw = w; 

energy = 0; 

while ( w <= n ) && ( Jvals (w) >= 1/2 * first ) 
energy = energy + Jvals (w)~2; 
w = w+1; 

end 

if energy > best 
best = energy; 
JO = J(f irstw:w-l) ; 

end 

end 

°/oAdd JO to the index set I 

I(length(I)+l: length(I)+length(JO) ) = JO; 

y„Update the residual 
PhiSubl = Phi(: , 1(2)); 
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for c=3:length(I) 

if ~isMember(I(2:c-l) ,I(c)) 

PhiSubK: ,c-l) =Phi(:, 1(c)); 
end 

end 

y = lscov(PhiSubI , x) ; 
r = X - PhiSubl * y; 

end % end Run IRA 

vSmall = PhiSubl \ x; 
vOut = zerosCd, 1); 
for c=2:length(I) 

vOutCKc)) = vSmall(c-l) ; 

end 



A. 4 Compressive Sampling Matching Pursuit 



%NAME: CoSaMP Tester 

"/.PURPOSE: Tests sparse signals on CoSaMP 
"/.AUTHOR: Deanna Needell 
"/.OUTSIDE FUNCTIONS: None 

"/.Testing Parameters 
sVals=[l:l:55] ; "/. Sparsity levels 
mVals=[5:5:250] ; "/.Measurement levels 
dVals=[256]; "/.dimension 

numTrials=500; "/.Number of trials per parameter set 

"/.Set Variable lengths and Data Collection 
nums=length(sVals) ; 
numm=length(mVals) ; 
numd=length(dVals) ; 

numCorrect = zeros (nums, numm, numd) ; 
trend99 = zeros(numm, 1); 



for is=l:nums 
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for im=l:nuniin 

for id=l:nunid 

"/oSet Parameters 
s = sVals(is) ; 
m = inVals(im) ; 
d = dVals(id) ; 

ZStart a trial 

for trial=l :numTrials 

%Generate Measurement matrix 
Phi = randn(m,d); 

"/oGenerate sparse signal 

z = randperm(d) ; 

X = zeros (d, 1) ; 

x(z(l:s)) = sign(randn(s , 1) ) ; 

"/oGenerate measurements 
u = Phi*x; 

"/.Begin CoSaMP 

"/olnitialize 

a = zeros(d, 1) ; 

V = u; 

it=0; 

stop = 0; 

while "stop 

"/oSignal Proxy 
y = Phi'*v; 

[tmp, ix] = sort(abs(y), 'descend'); 
Omega = ix(l :2*s) ; 

[tmp, ix] = sort(abs(a), 'descend'); 
T = unionCOmega, ix(l:s)); 

"/oSignal Estimation 
b = zerosCd, 1) ; 
b(T) = Phi(: , T) \ u; 
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"/oPrune 

[tmp, ix] = sort(abs(b), 'descend'); 

a = zeros(d, 1) ; 

a(ix(l:s), 1) = b(ix(l:s), 1); 

"/oSample Update 
V = u - Phi*a; 

"/olteration counter 
it = it + 1; 



"/oCheck Halting Condition 

if norm(a-x) <= 10" (-4) I I it > max(8*s, 60) 
stop = 1; 

end 



end %End CoSaMP iteration 



%Collect Data 

if norm(a-x) <= 10" (-4) 

numCorrect (is, im, id) = numCorrect (is , im, id) + 1; 

end 



end % End trial 
end "/od 



if trend99(im) == && numCorrect (is , im, id) >= 0.99*numTrials 
trend99(im) = s; 

end 
end °/„m 
end %s 



A. 5 Reweighted LI Minimization 

/dNAME: Reweighted Ll-Minimization Tester 

"/oPURPOSE: Tests sparse and noisy signals on Reweighted LI 

"/.AUTHOR: Deanna Needell 

'/.OUTSIDE FUNCTIONS: CVX package (Michael Grant and Stephen Boyd) 



N = 256; "/.dimension 
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M = 128; "/omeasurements 
kVals = [30] ; 7oSparsity 
eepsVals = [1] ; 
numTrials = 500; 
maxlter = 9; 

errorVecDecoding = zeros (length (kVals) , length (eepsVals) , maxlter .numTrials) ; 
errors = zeros (numTrials, 1) ; 
for trial = 1: numTrials 

for kindex = 1 : length (kVals) 
K = kVals (kindex) ; 
for eindex = 1 : length(eepsVals) 
eeps = eepsVals (eindex) ; 

% Gaussian spikes in random locations 
X = zeros(N,l); q = randperm(N) ; 
x(q(l:K)) = sign(randn(K,l)); 

% measurement matrix 

Phi = sign(randn(M,N))/sqrt(M) ; 

% observations 
err = randn(M, 1) ; 

sigma = 0.2*norm(Phi*x,2)/norm(err,2) ; 
err = sigma*err; 
y = Phi*x + err; 
errors(trial) = norm(err,2); 

for iter = l:maxlter 
"/oSet the weights 
if iter > 1 

weights = 1 . /(abs(xDecoding)+eeps/(1000*iter) ) ; 

else 

weights = l*ones(N,l); 

end 

"/oSet noise tolerance parameter 
delta=sqrt (sigma"2* (M+2*sqrt (2*M) ) ) ; 

%Use CVX to perform minimization 
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cvx_begin 
cvx_quiet (true) 

variable xa(N) ; 

minimizeC norm(diag(weights)*xa,2) ); 
subject to 

norm(Phi*xa - y, 2) <= delta; 

cvx_end 

yoCollect results 
xDecoding = xa; 

errorVecDecoding(kIndex,eIndex, iter, trial) = normC (x-xDecoding) ,2) ; 

end 

end 

end 

end 



A. 6 Randomized Kaczmarz 

°/oNAME: Randomized Kaczmarz Tester 
"/oPURPOSE: Tests RK on noisy systems 
"/oAUTHOR: Deanna Needell 
y.OUTSIDE FUNCTIONS: none 

clear all 
warning off all 

m = 100; %rows 
n=100; "/.columns 

numlts = 1000; 
numTrials = 100; 

A = zeros (numTrials, m, n) ; 
e = zeros (numTrials, m) ; 
X = zeros (numTrials , n) ; 
b = zeros (numTrials, m) ; 
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est = zeros (numTrials, numlts, n) ; %estimations 
initErr = zeros (numTrials) ; 

R = zeros (numTrials , 1); /dValue of R (as in paper) 
mu = zeros (numTrials) ; "/oCoherence 

gamma = zeros (numTrials, 1); "/oWorst error to row norm ratio 
beta = zeros (numTrials) ; "/.beta is in theorem 
errorsRK = zeros (numTrials , numlts); 
errorsCG = zeros (numTrials, numlts); 

for trial=l : numTrials 

if mod (trial, 1) == 

display ( ['Trial num2str (trial)] ) ; 

end 

ZSet matrix equation 

A (trial, :, :) = sign(randn(m,n) ) ; 

X = zeros (n, 1) ' ; 

b(trial,:) = reshape (A (trial, :, :), m, n)*(x'); 

ZSet initial guess, I |x - xOl I =1 
est (trial, 1, :) = randn(l, n) ; 

est (trial, 1, :) = est (trial, 1, :) / norm(reshape (est (trial, 1, :), 1, n),2) 

origest = reshape (est (trial, 1, :), 1, n) ' ; 

ZAdd error to RHS of Ax=0 
e (trial, :) = randnd, m)*2; 

e (trial, :) = e (trial, :) / norm(e (trial , :), 2) / 10; 
b(trial, :)=b(trial, :)+e(trial, :); "/.'/.nNOISY! ! 

"/oCalculate stats 

initErr = norm (reshape (est (trial, 1, :), 1, n) - x,2); 

fronorm = norm (reshape (A (trial, :, :), m, n), 'fro'); 

R (trial) = norm (pinv (reshape (A (trial , :, :), m, n)) ,2)*fronorm; 

temp = zeros (1, m) ; 

for i=l:m 

temp(i) = norm (reshape (A (trial, i, :), 1, n) , 2); 

end 

gamma(trial) = max(abs( e(trial, :)./temp )); 

errorsRK (trial , 1) = norm(reshape (est (trial, 1, :), 1, n)-x,2); 
errorsCG (trial, 1) = norm(reshape (est (trial, 1, :), 1, n)-x,2); 



y.Run RK 
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for it=2:numlts 

"/oSelect random hyperplane 
pick = rand * fronorm"2; 
counter = 0; 
index = 1 ; 

while counter + norm (reshape (A (trial , index, n) , 2) "2 < pick 

counter = counter + norm (reshape (A (trial , index, n) , 2)"2; 

index = index + 1; 

end 

%Modify estimation 

est (trial, it, :) = est (trial, it-1, :) + (b (index) - dot ((A (trial, index, 
(est (trial, it-1, :))) )/ (norm(reshape(A(trial, index, :), 1, n),2)~2) * 
errorsRK (trial, it) = norm(reshape (est (trial, it, :), 1, n)-x,2); 

end 

y.Run CG 

for it=l:numlts 

[estcg,flag] = cgs (reshape (A(trial, :, :),m, n) ,b(trial , : ) \ 10" (-10) , it); 
errorsCG (trial, it) = norm(estcg-x' ,2) ; 

end 



end 
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